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Abstract 

Spatio-temporal modelling is an increasingly popular topic in Statistics. Our paper contributes to this line of research 
by developing the theory, simulation and inference for a spatio-temporal Ornstein-Uhlenbeck process. We conduct de¬ 
tailed simulation studies and demonstrate the practical relevance of these processes in an empirical study of radiation 
anomaly data. Finally, we describe how predictions can be carried out in the Gaussian setting. 
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1 Introduction 

Due to advances in storage and computational efficiencies, more data with spatial and temporal information are being 
collected and shared. Taking the view that a pure temporal or spatial analysis of such data is insufficient, many scientists 
have proposed statistical models to study the spatio-temporal interactions and dependencies (see for example. Section 
6.7 in Cressie & Wikle (2011) for an overview, Bai et al. (2012), Sarkka & Hartikainen (2012) and Davis et al. (2013)). 
We contribute to this research area by extending the theory of a spatio-temporal Levy-driven Ornstein-Uhlenbeck (OU) 
process, and pioneering its simulation and inference. 

This spatio-temporal OU process, which is referred to as the OUa process in Barndorff-Nielsen & Schmiegel (2004), can 
be written as: 

Yt{x) = [ exp(-A(f - s))L(d^,ds), (1) 

J At{x) 

where {^((x) : x 6 Af, f 6 T} is a random field in space-time S = X x T- Usually, we have Af = R'^ for some d 6 N 
and T = R. Similar to the classical OU process, A > 0 acts as a rate parameter. However, to cope with the new spatial 
dimension, we no longer integrate a Levy process over (— oo, t]; instead, we integrate a homogeneous Levy basis L over 
a set j4t(x) C S (with spatial and temporal integrating variables ^ and s). The process is well-defined if the integral 
exists in the sense of the Cq integration theory (Rajput & Rosinski 1989). A summary of this is given in Section 2 of the 
supplementary material provided in Nguyen & Veraart (2016). 

The set At{x) can be interpreted as the region in space-time that influences the field value at (x, t). This is in line with 
the interpretation of an ambit set (see for example Barndorff-Nielsen et al. (2015)). For an OUa process, we require that 
j4j(x) = j4o(0) + (x, t) (Z S for translation invariance. Furthermore: 

Als(x) C Altjx), V s < f, and n (Af x (f, cx))) = 0. (2) 

This implies that rit(x) has a temporal component of (—cx),f] just like the classical case. These conditions on the 
integrating set also give the OUa process several characteristic properties of the OU process: stationarity, Markovianity 
* Address correspondence to: M. Nguyen, Department of Mathematics, Imperial College London, 180 Queen’s Gate, SW7 2AZ London, UK. 
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and an exponentially decaying autocorrelation function (ACF). 

In the literature, there are other definitions of spatio-temporal OU processes. When the driving Levy noise is restricted 
to be Gaussian and Af = R, a spatio-temporal process can be formed from the product of a temporal OU process with 
a spatial one (Traulsen et al. 2004). This is equal to a temporal OU process when the spatial component is fixed and a 
spatial OU process when the temporal component is fixed. Although this model features exponentially decaying temporal 
and spatial autocorrelation, one limitation is that the spatio-temporal autocorrelation is separable. 

Alternatively, if we discretise space, we can create a spatio-temporal OU process by considering a multivariate OU process 
where each vector component of the process corresponds to a spatial location. Such an approach can be found in Brix 
& Diggle (2001). Spatial information is contained in the covariance matrix of the driving Brownian motion. Although 
this also results in a separable spatio-temporal autocorrelation, the spatial autocorrelation is comparatively flexible to that 
obtained by the previous method. 

A further step in this direction would be to discretise the time domain. In this case, the OU process can be represented as 
an autoregressive (AR) model. A three-stage iterative procedure for the space-time modelling of autoregressive moving 
averages (ARMA) models can be found in Pfeifer & Deutrch (1980). The spatio-temporal autocorrelation in this case is 
defined differently and involves spatial neighbours of different “orders”. 

The OU/^ process in (1) has four key advantages over these alternative models. Firstly, it accommodates non-Gaussian 
driving noise which may be more appropriate in practice. Secondly, being a model in continuous time and space, it allows 
us to work with data of varying temporal and spatial scales. Thirdly, it allows for non-separable autocorrelation structures 
defined in the usual way. Finally, its integration set helps us identify the influence region for a particular time and space 
location. This could help scientists better understand the spatio-temporal interactions in real-life phenomena. 

A wide class of OUa processes for Af = R, which we shall refer to as the 5 -class, is given by: 

/ t |•x+g{\t-s\) 

/ exp(-A(f - s))L(d^,ds), (3) 

-00 J X—£r(|t—s|) 

where g is a non-negative strictly increasing continuous function on [0, cx)). Now, At (x) = {(^s) : s < t,x-g(jt-sl) < 
^ < X + g{\t — s|)}. The case g{\t — sj) = c\t — s| with c > 0 is particularly interesting as it is related to an 
inhomogeneous stochastic wave equation (see for example, Dalang (2009) and Chapter 5 of Chow (2007)). When L 
is Gaussian, Zt{x) = exp(Af) 1 ^ 4 ( 3 :) is a mild solution of: 

+ exp{\t)L{x,t), with lim Zt(x) = 0 , and lim = 0 . 
aU ox-^ t ^-00 t->-cx 3 at 

Here, c can be interpreted as the wave speed while the non-linear term exp(Af) can be seen as the source function which 
describes how the sources of the waves affect the medium through which they travel. This effect is randomly perturbed 
by the Gaussian noise L. This linear choice of g is also the canonical example of an OUa process. The triangular shape 
of the integrating set for fixed x and t is thought to be behind the “A” in the process’s name. 


Outline We begin in Section 2 by providing the theoretical background of the OUa process and the definitions of 
concepts to be used later. In Section 3, we review the process’s known attributes as mentioned in Barndorff-Nielsen & 
Schmiegel (2004) and derive new properties including spatio-temporal stationarity, temporal and spatial ergodicity, and 
autocorrelation structures. Sections 4 and 5 look at how we can utilise these for simulation and inference respectively. We 
develop two algorithms for the canonical OUa process based on discrete convolutions. While one algorithm simulates 
on rectangular grids, another simulates on diamond-shaped grids. These will be compared and used in simulation studies 
with two estimation methods. The first is a spatio-temporal extension of the moments-matching inference method in 
Jonsdottir et al. (2013), and the second a least-squares approach to involve more lags of the normalised variograms. In 
Section 6, we fit the canonical process to radiation anomaly data from the International Research Institute for Climate and 
Society to illustrate how OUa processes can be used in practice. We also provide theoretical results for prediction in the 
Gaussian scenario. The paper concludes with a summary of our key findings and an outlook on future research in Section 
7. 
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2 Preliminaries 

We have seen in (1) that an OU/^ process is defined in terms of a stochastic integral over a subset of S with respect to 
a Levy basis. To understand this more concretely, we need to set out several notations. Let S be the Borel cr-algebra 
on our space-time S and Bb(S) = {E 6 S : Xd+i{E) < oo} where A^+i denotes the Lebesgue measure on 
Throughout this paper, we work in the probability space (ff, T,P). A Levy basis is defined as follows (Barndorff-Nielsen 
et al. 2015, Sato 2007): 

Definition 1 (Levy basis). 

L is a Levy basis on {S,S) if it is an independently scattered and infinitely divisible random measure. This means that it 
satisfies the following conditions: 

1. L is a set ofM-valued random variables {L{E) : E 6 Bb{S)}. Let {Ei : i 6 N} be a sequence of disjoint elements 
ofBb{S). Then: 

• /o'" U^i £ Bb(S), L{[y^^^ Ej) = L{Ej) almost surely: 

• furthermore, L{Ei) and L{Ej) are independent for i j. 

2. For any finite collection Bi ,..., of elements ofBb(S), the random vector L = {L{Bi),L{Bm)) is infinitely 
divisible. That is, for any n 6 N, there exists a law such that the law of L can be expressed as p = pff, the 
n-fold convolution ofp„ with itself. 

Remark 1. The realisation of the random measures defined here are not in general measures because they need not have 
finite variation. The reader is advised to note that there are other definitions of random measures in the literature. 

Definition 2 (Cumulant generating function and cumulants). 

Let C{6 :( Z} = logE [exp {iOZ)] denote the cumulant generating function (CGF) of a random variable Z. This means 
that C{9 it Z} is the unique complex-valued continuous function on R such that C{0 it Z} = 0 and exp(C '{6 it Z}) = 
E[exp(i 6 Z)j. More information on the definition of a distinguished logarithm can be found on page 33 of Sato (1999). 
The cumulants of Z, Ki{Z),for I = 1,2,..., are defined through: C{6 if Z} = K‘l{Z)(i9y/l\. 

Definition 3 (Homogeneous Levy basis and its seed). 

A Levy basis L is homogeneous if there exists a random variable L’ such that C{9 if L{E)} = C{9 if L'}d^ds for 

all E E Bb{S). We refer to L' as the Levy seed of L. Its CGF, C{9 \ L’}, follows the Levy-Khintchine (LK) formula: 
ia9 — ^b9'^ -f J]j(exp(i 6 < 2 ) — 1 — <i)i^(dz) where a 6 R, 6 > 0 and v is a Levy measure, i.e. it has no atom at 0 

and satisfies fg min(l, z^)u(dz) < oo. We call [a, b, v) the LK triplet of L'. 

The relationship between the CGFs of L(E) and L' was established in Proposition 2.4 of Rajput & Rosinski (1989). The 
link between the CGFs of L' and the resulting OUa process will be derived in Theorem 9. 

In this paper, we will be using OUa processes with homogeneous Gaussian bases for illustration. For such a basis, 
L{E) ~ N{p\d+i(E),T'^\d+i(E)) for any E 6 Bb{S). Here, p and are the mean and variance of the Levy seed 
respectively. Additional simulation studies have been conducted using homogeneous inverse Gaussian (IG), the normal 
inverse Gaussian (NIG) and the Gamma bases. Interested readers can refer to the supplementary material provided in 
Nguyen & Veraart (2016). 

As previously mentioned, the Levy basis L and its seed play important roles in the distribution of the resulting OUa 
process. This can be characterised by a spatio-temporal extension of the generalised cumulant functional in Barndorff- 
Nielsen et al. (2015): 

Definition 4 (Generalised cumulant functional and cumulants). 

Let Y = {kt(x)}xgR<i teR denote a stochastic process in space-time (S = R'* x R), and let v denote any non-random 
measure such that the integral v{Y) = Yt(x)z;(dx, df) exists almost surely. The generalised cumulant functional 
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(GCF) ofY with respect to v is given by: C{6 J r'(y)} = logE [exp (idv (1^))] where the logarithm is the distinguished 
logarithm. 

Ifv(dcx., dt) = Jt(dt)5x(dx) where St and S^. denote the Dirac measures at t and x respectively, C{9 J r'(y)} is CGF of 
Yt{^). 

As mentioned in Barndorff-Nielsen et al. (2015), there are at least two other interesting cases of the measure v. 

(i) if z;(dx, dt) = 9iSt^ (df)Jxi (dx) H— ■ + (df)(5x„ (dx), C{0 is the joint cumulant generating function 

(JCGF) of yt„(x„); 

(ii) and if z;(dx, df) = l/(x, f)dxdf, where I is a region in S' = x R and l/(x, f) = 1 if (x, t) E I and 0 otherwise, 
C{9 j: u(l^)} determines the distribution of Yj(x)dxdf. 

Note that the case (ii) is relevant if we are using OUa processes to study integrated volatility or intermittency. 

The next few definitions are required to study the theoretical properties of the OUa process and conduct inference. First, 
we extend the definition of (strict) temporal stationarity to that of spatio-temporal stationarity. 

We say that Yjjx) has temporal stationarity if for every x 6 R^*, {yt(x)}jgR is a strictly stationary temporal process. 
That is, for every n € N and e 6 R, such that fi,..., 6 R, the joint distribution of (x),..., (x) is the same as 

that of Yjj+e(x),..., Yj^+e(x). Analogously, Utjx) is said to have spatio-temporal stationarity if for n 6 N, u 6 R'^ and 
e 6 R, such that x\, ...,Xn € R'^ and ti, 6 R, the joint distribution of (xi),..., Yt^{TCn) is the same as that of 
Yjj+^(xi + u),..., Yj^+^(x„ -f u). 

For our ergodicity results, we also need the classical definition of ergodicity, here given in the notation of Fuchs & Stelzer 
(2013). Fet {AtltgR be a real-valued strictly stationary process defined on the probability space ((R)*, P) with 
T = B((R)*). We say that {Xt)teR is ergodic if P(A n S^B)dt '^4°° P{A)P{B), where A, B e P. Here, 

{S*)teR is the induced group of shift transformations on (R)®. That is, S*Xs = x^-t for any and f 6 R. (Xt)jgR 

is mixing if P{A n S^B) *4°° P{A)P{B). 

3 Theoretical properties 

In this section, we review the known theoretical properties of OUa processes, as discussed in Barndorff-Nielsen & 
Schmiegel (2004), and derive various new results. These are important both from a modelling point of view and for 
developing statistical inference. The proofs of the Theorems, if not given or referenced, are provided in the Appendix. 

3.1 Stationarity and Markovianity 

In Barndorff-Nielsen & Schmiegel (2004), it was proved in slightly different notation, that: 

Theorem 1. An OUa process {Yj(x)}tg 7 - can be decomposed as: 

Yt{x.) = exp (-Af)Y'o(x) + Ut{x) + Ut(x), (4) 

where Ut{x) = exp (—Af) / exp (As)L(d^, ds), 

J At(-x.)r]{X X ( —c!o,0])\xlo(x) 

Vj(x) = exp (—Af) / exp (As)L(d^, ds). 

JAt{x.)r]{X X (0,oo)) 

In representation (4), Yo(x), {(/t(x)}t>o and {Vt(x)}t>o are independent. Furthermore, {Yt{'x.)'\teT is Markovian and 
stationary. 

Stationarity is an important feature for inference as it allows us to pair observations based on temporal distances and 
consider quantities such as the sample autocorrelation. Markovianity is also a highly desirable property as it can be used 
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to speed up simulations, make step-ahead predictions and simplify likelihood computations. 

Besides temporal stationarity, Barndorff-Nielsen and Schmiegel also give an intuitive reasoning for y’t(x)’s spatial sta- 
tionarity. This involves the homogeneity of the Levy basis and the ambit set assumptions. However, by using the GCF, one 
can show that actually has spatio-temporal stationarity. By definition, this includes temporal and spatial stationarity. 


Theorem 2. Let A = Ao(0). Assume that for all ^ E X = and s € T = K, 

(?,«)= / 1 a(C “ s — f) exp(—A(f — s))u(dx, df) < cx), 

Js 

and that s) is integrable with respect to the Levy basis L. Then, the GCF ofY with respect to v can be written as: 


C{eXv{Y)}= / C7{0/rA(?,s)1:L'}d^ds 
Js 

= ida [ hA(^, s)d^ds —-6^b ( s)d^ds-f f ( {exp{i6uz) — I — i 0 uzl\ 2 \<i)t^(dz)x{du), 

Js 2 Jg Jr Jr 

where (a, h, v) is the LK triplet of L' and x is the measure on R obtained by transforming the Lebesgue measure on S by 
the mapping (^, s) —> s). 


Proof This is analogous to the proof for Proposition 5 in Barndorff-Nielsen et al. (2015) with Ha being defined differently 
to account for space-time and our definition of Y) (x). □ 

Theorem 3. Let Yt(x) be an OU/\ process. Then Yt{x.) has spatio-temporal stationarity. 


Proof. Let w(dx, df) = YJJi=i (df)<5xi (dx). From Theorem 2, the JCGF of (xi), • • • , Y(^ (x„) can be written as: 

P n 

C{dtv{Y)} = / C{9hA{^,s)tL'}d^dswherehA{^,s) = ^lA{^-Xi,s-ti)exp{-X{ti- s))ei. (5) 
Js 


By using the change of variables u = ^ — xi and e = s — fi, we have C{9 J w(Y')} = C{9h'j^{u, e) if L'jdwde, 

where h'^{u, e) := 1a(u + xi — x^, e -f G — f) exp(—A(fi — G — e))9i. The JCGF does not depend on the space 

or time coordinates but only their differences. This means that Yt{x) has spatio-temporal stationarity. □ 


3.2 Autocorrelation structures 


There is a wide range of literature on suitable choices of spatio-temporal autocorrelation and covariance functions (see 
Section 6.7 of Cressie & Wikle (2011) for a review). Hence, we show what type of autocorrelation structures can be 
obtained in our parsimonious class of OUa processes when the Levy basis L is assumed to have finite second moments. 
With spatio-temporal stationarity, we can compute the spatio-temporal autocorrelation of such an OUa process. Since L 
is independently scattered, we find that, in terms of spatial and temporal differences (dx € R'^ and dt 6 R): 


Cov[yt(x), Yt+dt (x -I- dx)] 
^ Corr[yt(x), Y)+d,(x -f dx)] 


Var[L'] exp(—Adt) / exp(—2A(f — s))d^ds. 

J At(x)nAt+rf, (x+dx) 

exp(-Adt) X4,(x)nA,+.,(x-Hdx) exp(-2A(f - s))d$ds 
fA,(^) exp(-2A(f - s))d^ds 


( 6 ) 


This provides information for both the temporal and the spatial autocorrelation. For the former, we set dx = 0 to get: 


■.= Corr['Yt(x),'Yt+d,(x)] = exp(-A|dt|). 


This is identical to the autocorrelation of a classical OU process. For the spatial autocorrelation (dx), we set dt = 0. 
The actual form depends on our ambit set At(x), as is shown in the next example. 
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Figure 1: Plot of the spatial autocorrelations achieved by different ambit sets of the form At{x) = {(^ 15 ) ’■ s < t,x — 
g{\t “ si) < 5 < a; + g{\t — s|)} where: (i) g{\t — sj) = c\t — sj; (ii) g{\t — sj) = cijf — sj if — sj < 1, and 
Cl + C 2 {\t — sj — 1) Otherwise. FF refers to the functional form of for < 2ci. 


(i) P® 


(ii) P® 




Example 1. Consider an OUa process of the g-class, i.e.; Yt{x) — Jx-g(\t-s\) 6 x;p(—A(f — s))L(d^, ds), where g 

is a non-negative strictly increasing continuous function on [0, cx)). Assume without loss of generality that dx > 0, and 
let {x*,t*) = {x — g~^ (^)) be the intersection point between At{x) and At{x -|- dx)- Then: 


Cov[Yt{x), Yt{x + dx)] = Var[L'] 




— oo Jx-\-dx—g{\t—s\ 

t* 


exp(—2A(f — s))d^ds 


Var[L'] j f 2g{t — s) exp{—2\{t — s))ds — f exp(—2A(f — s))ds 

\J —OO J —OO 

:Var[L']^^ 2fl + e“2^(“+9“'('^))dw-^ d^e“2^(“’+3"'('^))dr 


(7) 


where w = t* — s. We give two examples to show how the spatial autocorrelation structure varies according to our choice 
of the ambit set: 


(i) If g{\t — s\) = c|f —s|, g ^ {dx/2) = dxl2c. By solving the integrals in (7), we find that p^^\dx) = exp (—Ad^^/c)- 


(ii) Ifg{\t-s\) = 


ci\t - s| 


if\t- s\ < 1 , 


Cl + C 2 (|f — sj — 1) Otherwise, 

By solving (7) for dx < 2ci and dx > 2ci separately, we have: 


g 1 {dx/2) = 


dx/‘It:\ if dx Y 2 ci, 

1 + otherwise. 

^ 2 C 2 


P^^\dx) 


Cl + (C2 —Ci)e-2^ 


^^g-2A(l + (da;-2ci)/2ci) 
ci+{c 2 -ci)e- 2 ^ 


Otherwise. 


for dx < 2 ci, 


Figure 1 shows the plots of the spatial autocorrelations for cases (i) and (ii) under different values of A, c, ci, C 2 and dx- 
From Plot (ii), we notice that introducing a change in gradient of g(|f — sj) from 0.5 to 1 at |f — s| = 1 results in changes in 
the overall scaling and gradient of the spatial autocorrelation. In particular, since the spatial correlation is a combination 
of two functions which are joined at dx = 2 ci = 1 , there is a change of behaviour at that point as denoted by the deviation 
from the first function (FF). Since we can choose different values of A, c, ci and C 2 to fit the curvatures of our empirical 
spatial autocorrelations, we see that OUa processes provide a flexible tool for modelling spatial dependencies. 

In the canonical case with g(|f—s|)=c|f—sj,we can find an explicif form for the spatio-temporal autocorrelation: 
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Example 2. Let Yt{x) be an OU/\ process described by: 

Yt{x)=f f exp(-A(i - s))L(dC,ds), (8) 

•/—oo Jx — c\t—s\ 

where c > 0. By considering the intersection of At{x) and At+dt(,x A dx) under different relative magnitudes ofdx and 
cdt, we find that the spatio-temporal autocorrelation is given by: 

GoYT:\Yt{x),Yt+dt{x + dx)] = min ^exp (-A|dt|), exp j j . (9) 

This is a non-separable ACF which cannot be obtained through the definitions of the spatio-temporal OU process in Brix 
& Higgle (2001) and Traulsen et al. (2004). Note that when dx = 0, we get pG^\dt) = exp(—A|(it|) as expected and 
when df = 0, we get p^^\dx) = exp (—Ajd^j/c) as calculated before. We have exponential decay in time and in space 
but with different rate parameters. Interested readers can refer to Section 3.1 of the supplementary material in Nguyen & 
Veraart (2016) for the details of this derivation. 


Example 3. The spatio-temporal ACF derived in the previous example can be used to obtain the JCGF of an OUa 
process of the form (8) with a Gaussian basis. The CGF of a Gaussian Levy seed L' with mean p. and variance is 
C{6\ L'} = ip9 — t^6^/2. Let s) = 1a(C ~ Xi,s — ti) exp(—A(U — s))6i where A = j4o(0), the ambit set 

at the origin and define 6i := 99 i. With reference to the notation in the proof of Theorem 3, the JCGF is given by: 


[ C{9hAi^,s)tL'}d^ds 
JrxR 

=ip'S^9i / exp(—A(U — s))d^ds — ^ -T^9i9j / exp(—A(ti-|-— 2s))d^ds 

i=l JAtfixi) ■> An(xi)r\AtAxj) 


,2c/i 


i=l i,J=l 


fexp(-AjU -tj|) ,exp 


using (6) and (9). 


3.3 Equality in law to the OU process 

In the previous subsection, we saw that the temporal autocorrelation of an OUa process is exactly that of an OU process. 
In many circumstances, we also find that when its spatial parameter is fixed, the OUa process is equal in distribution to 
an OU process. This is one key reason the OUa process can be seen as a spatio-temporal OU process. Before we present 
the corresponding Theorem, we give a Lemma which is used in its proof: 

Lemma 1. Let ti < t 2 € M. be two arbitrary time points, and let Yt{x) be an OUa process. Then, for fixed x 6 R'*.’ 

■^^(x) = exp(-A(f 2 - fi))Ui(x) -f (10) 

where (7(fi,f2,x) = (x)\A, (x) “ s))U(dUds) is independent of Yt^{x). 

Proof. Since As(x) C j4j(x). Vs < t, At^i'^) H = Atj(x) and can be partitioned into Atj(x) and 

(x)\Altj (x). The result follows directly by using the definition of U(x) and the fact that the Levy basis is indepen¬ 
dently scattered. □ 
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Theorem 4. Let Tt(x) = exp(—A(i — s))L(d^, ds) be an OUa process where the Levy seed L' has LK triplet 

(a, b, u), and let Zt = flea sxp(—A(t — s))L(ds) be an OU process where the Levy seed L' has LK triplet (d, h, v). 
Then, for fixed x 6 R'*, the process {Yt(x)}tgR is equal in law to if: 


d = \ 


[ a- [ zl 

J A L Jm. 


l<|z|<e-^“ v{dz) 
0 


dudw + / e"™ / Zll<|.|<e-- i>(d 2 )dw 

J— oo Jm. 

b = 2X f be^^™dudw and ( dy)dw = f ^(e“'^™dt/)dudw;, 

J A J — oo J A 


where A = j4o(0), u = ^ — x and w = s — t. 


Example 4. Let Yfix) be an OUa process of the g-class and suppose that the Levy seeds L' and U have Levy densities, 
i.e. there exists measures and such that v{dy) = i>d{y)dy and v{dy) = vci{y)dy. We can find an explicit formula 
for V by translating the third condition of Theorem 4 into e~^'^i'd{e~^™y)dw = fl,y^ 2g(—vj)e~^™iyci(e~^™y)du!, 
Vy 6 R. By substituting z = e~^^y, we have: 


Ay 

D{dz) = 2 


Ay" 

(Az)-! 


log(^/ti) 

A 


log(g/l/) 

A 


Sgjw 

Sw 


j Vdiz)dz ^ J Vd{z)dz = J 2g 

i^d(y)dy^ dz -I- g{0)u{dy) 


Vd{z)dz (11) 


D=X 1 log(z/y) 


where we have used Leibniz’s Integral Rule to differentiate both sides of (11) with respect to y. 


Example 5. Let Yfix) be a canonical OUa process as described in (8). If L is a Gaussian basis whose seed has mean p 
and standard deviation T,forfixed x, Yfix) = Zf where Zt is an OU process defined by Zt = fl^ exp(—A(t — s))L(ds). 
Here, L is a Gaussian basis whose seed has mean fi = 2cp/\ and standard deviation f = 

In general, the Levy measure of L, v is related to that of L, v via a transformation depending on A and the ambit set. This 
means that the distribution of the Levy process L corresponding to Zt may not be of the same family as that of the Levy 
basis L corresponding to Yt{x). 


3.4 Time-changed versions 

Time-changed versions of classical OU processes have been studied by Barndorff-Nielsen and Shephard. These were 
introduced to eliminate dependence of the cumulants on the parameter A (Barndorff-Nielsen & Shephard 2001). This 
means that A only appears in the ACLs and can be interpreted as a memory parameter. Lor the OUa process, we have an 
additional spatial dimension. As a result, we have more than one way to introduce a time change. Note that these methods 
are based on particular classes of OUa processes. The first time-change method is taken directly from the temporal idea: 

Definition 5 (Time-changed OUa process of order q). Let L, A and Alj(x) satisfy the conditions in the definition of an 
OUa process. kt(x) is an time-changed OUa (TCOUa) process of order g 6 N i/!' 

Yt(x) = I exp{—X(t — s))L{d^, X^ds), and the integral is well-defined. 

Theorem 5. Consider the following (q -\- 1)* order TCOUa process: 

/ exp(-A(f-s))L(dC,A«+Ms), (12) 

-OO J x—c\t—s\'i 

where g 6 N A chosen such that the ambit set satisfies the conditions (2). The cumulants ofYfix) do not depend on X. 
Another way to eliminate the A dependence in the cumulants is to define a first-order TCOUa process with time-changed 
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ambit sets. This means that At{x) depends on the parameter A. 


Theorem 6 . Let g be a non-negative strictly increasing continuous function on [0, oo). We can construct an TCOU/\ 
process as follows: 


ft fx-Vg{)^\t-s\) 

Ytix) = I / exp(-A(t - s))L(d^, Ads), 

-OO t/x— 5 '(A[t—s|) 


(13) 


't(tc )= / r 

J —oo jX- 

where x, i 6 R and A > 0. Yt{x) has cumulants which do not depend on A. 

The equality in law of the OUa and the OU processes at fixed locations can also be preserved after a time-change: 


Theorem 7. Let T)^(x) be a TCOUa process of the type (12) where the Levy seed L[ has LK triplet (oi, 6 i, ui) and 
let y)^(x) be a TCOU^ process of the type (13) where the Levy seed L'^ has LK triplet (a 2 ,b 2 ,t' 2 ). Further define 
Z} = exp(—A(f — s))Z/i(Ads) and exp(—A(f —s))L 2 (Ads) to be time-changed OU (TCOU)processes 

with corresponding Levy seed LK triplets (di, 6 i, Ti) and ( 0 , 2 , 621 ^ 2 ) respectively. Then, for fixed x 6 R'*, the process 
{■U/(x)}tgR is equal in law to {Z^jteR if: 


di= 2cw^e 

Jo 


ai — zl 


l<z<e' 


>ui(dz) 


dw d- 


e / zli^z<e^vi(dz)dw 


61 = 4 / bicw'^e ^^dw and / i>i(e™dy)dw = / 2cw^^i(e™dj/)dw, 

Jo Jo Jo 

where u = x — ^ and w = t — s. Similarly, for fixed x 6 R'*, the process {y)^(x)}tgR is equal in law to if-' 

poo r /* "I ^00 p 

a 2 - zli<z<e-i^ 2 (dz) dw-\- e~'^ / zli^z<e^d 2 {dz)dw 

Jw J Jo Jr 


a 2 = / 2g{w)e 
Jo 


62 = 4 / b 2 g{w)e ^^dw and / i> 2 {e^dy)dw = / 2 g{w)h' 2 {e^dy)dw. 

Jo Jo Jo 

Example 6 . When L'-^ and have Levy densities and respectively, we can find explicit formulae for ui and z> 2 - 
Similar to Example 4, we can write the Levy measure conditions in terms of Levy densities and use Leibniz’s Integral 
Rule. This gives us: 


vi{dy) = 2cqy ^ f (\og{z/y))‘’ ^ ui(dz)dy and V 2 {dy) = 2 y ^ f 
J y J y 


dg(y) 

dx 


\x=\o^{z/y) 


V2(dz)dy + g(d)v2(dy) 


As mentioned, working with TCOU a processes allows us to interpret the parameter A solely as the memory parameter. 


3.5 Ergodicity 

Stationarity and autocorrelations are useful for estimating the parameters of an OUa process. Another useful property is 
ergodicity which is typically used to derive a Weak Law of Large Numbers. This in turn is used to establish the consis¬ 
tency of the estimators. We find that the g-class of OUa processes possesses this property in time and space: 

Theorem 8 . Let Yt{x) be defined by (3). Then, {Y((x)}tgR and {Tt(x)}a; 6 R are ergodic. 


3.6 Probability distributions 

Cumulants, together with normalised variograms, are used to conduct inference on OUa processes in Section 5. We 
present a theorem relating the CGL of an OUa process lt(x) to its Levy seed L': 


9 
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Theorem 9. With A = j4o(0), u = ^ — x,«; = s — t and {a, b, v) being the LK triplet of L', the CGF ofYtlpt) can be 
written as: 


C{e t yt(x)} := log (E [exp (i6>yt(x))]) = / t L'jd^ds = [ t L'}dudw (14) 

J At{x) Ja 


= iOav - -O'^by + / - 1 - i6lzl|^l<i)^y (dz), 

^ Jm 


where ay = [o — fg 2 li<| 2 |<e-Ami'(dz)] dudw, by = be^^^dudw and vy(dy) = iy{e ^™d?/)dudw. 


Proof. We can obtain (14) by considering a special case of Theorem 2. To find the LK triplet {ay, by, vy), we write (14) 
in terms of the LK triplet of L' and evaluate each resulting integral. For ay, this involves restricting the integration bound 
for the inner integral to cases where l|ze-Amj<i — l|z|<i is non-zero. For vy, we use y = z exp(Aw) and change the order 
of integration so that we integrate over u and w before integrating over y. □ 

Example 7. For the canonical OUa process given by (8), suppose that L is a Gaussian basis whose seed has mean p and 
standard deviation t. Then, we obtain that Yt{x) ~ N {2cpf ,ct‘^I 2\‘^) since: 

C{eXY^{x)} = I* "'c{0exp(-A(f-s))1:L'}d^ds = i0(^^j 


As the cumulants of lt(x) are the coefficients in the power series of its CGF, we can use Theorem 9 to derive expressions 
for the cumulants of yt(x), ki {Yt (x)) for I 6 N. For the canonical OUa process, it can be shown that: 

Ki{Yt{x)) = ki{L') f [ exp(-/A(f - s))d^ds = k/(L') /" 2c(f - s) exp(-iA(f - s))ds = 

Here, ki{L') refers to the cumulant of the Levy seed. 


4 Simulation 


In this section, we develop two simulation algorithms for the canonical process defined by (8). Let Yt{x) be a canonical 
process. By reducing the stochastic integral to a truncated sum, we obtain its discrete convolution (DC) approximation. 
This can be computed efficiently by the in-built convolution algorithms in software such as R and Mathematica. Based 
on this idea, we create two algorithms: one for values on the usual rectangular grid and another for values on a so-called 
diamond grid to mimic the edges of the integration set. The former can be easily extended to more general OUa processes. 
By writing w = t — s and u = a; — (8) is equal to: 


n x-\-cw poo poo 

exp{—Xw)L{dXt — dw) = — / / h{u,w)L(x — du,t — dw) 

— cw Jo J —oo 

h{u, {du, dw), 


poo poo 


(16) 


where := —L{x — f — •) is the Levy basis given by ^{A) = —L{{x, t) — A) for A 6 Bb{S), and h{u, w) = 
1|m|<cii; exp (—Aw). Figure 2(i) shows the plot of h{u, w) when c = 1. 

Since h{u, w) —> 0 as w —> ±cx) and w —> cx), it seems reasonable to approximate (16) by the following DC: 


Yt{x) 

3=0 i=-q 


(17) 


10 
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Figure 2: (i) Plot of the kernel h(u, w) = l|„|<cm exp (—Aw) when c = 1 for 0 < w < 4, —4 < u < 4; (ii) plot of the 
rectangular simulation grid set by spatial and temporal coordinates {xj ■.1 = 1,n} and {tj : J = 1, ...m} and the 
extension to accommodate the kernel approximation for the boundaries. The original grid points are denoted by points 
while the additional ones are denoted by crosses. Here, n = m = A and p = q = 2. The Levy noise over the extended 
simulation grid is represented by the Ws which are attached to the grid points on their left. For the grid point 
the kernel is evaluated at the points within the shaded area. The shaded area is an approximation for the ambit set (now in 
u, w form). The true transformed ambit set is outlined by the bold red line. 

(i) (ii) 




tm ts ^2 tl 


Figure 3: (i) Field view of data simulated from the rectangular grid algorithm for Yt{x) when g(w) = w, X = 1 and L is 
a Gaussian basis whose seed has mean 0.2 and standard deviation 0.1; (ii) a heat map for the same data. The simulation 
grid is over the space-time region [0,10] x [0,10] with grid size A„ = = 0.05 and kernel truncation parameters 

p = q = 300. 

(i) (ii) 
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where wj = j^w and Ui = zA^ for some A^, A„ > 0, and p and q are arbitrarily chosen integers. : —q < 

i < q,0 < j < p} is a set of independent identically distributed random variables which depend on These 

(x t) 

represent the Levy noise over each grid segment. For example, for the rectangular grid algorithm, we take ~ 

L([0, Au] X [0, Au,]) for all i, j. The approximation in (17) is expected to improve as A^,, A^ —> 0 andp, g —> cx). 

4.1 Rectangular grid algorithm 

Referring to (17), we can use the optimised convolution function for vectors in the R software to create a simulation 
algorithm for Yt(x). Alternatively, one can use any similar function in other software such as the convolution function for 
lists in Mathematica. Since such functions are typically written in terms of Discrete Fourier Transforms, it is also possible 
to code using these directly. 

To begin, we choose p, q, A^ and A^,. We create a list of row vectors of values of h: 

h ={[h {u-q,wo) ,h{u-q,wi ),.. .,h{u_q,Wp)] [/i(wo, Wo) ,h{uo,wi ),.. .,h{uo,Wp)], 

...,[h {Uq, Wo) , h {Uq, Wl) {Uq, Wp)]}. 

Next, we choose our simulation grid by determining the spatial and temporal coordinates {xi : I = 1, and {tj : 
J = 1,..., m}. These need to be separated by A^ and A^ respectively. To account for boundary effects, i.e. to be able to 
use the entire kernel approximation for boundary points, we need to extend the grid by q on both ends of the spatial axis 
and p on one end of the temporal axis. In Figure 2(ii), we consider the case n = m = 4, p = q = 2 and c = 1. Simulating 
with such a discretisation scheme means that there are three sources of simulation error: the kernel truncation, the kernel 
discretisation and the block approximation of the ambit set. 

Let h = n + 2q, rh = m + p and denote the realisations of Wki for fc = 1,..., n and I = 1,..., m. We create a list of 
row vectors of these values: W = {[wn, wi 2 , ..., wi*] ,..., [wni,Wn 2 , • • •, Wnm]}- 

With hi as the z* row vector in h and Wk as the fc* row vector in W, we can simulate from Yt{x) using Algorithm 1. 
Step 1.6 is equivalent to calculating y = [YtA^i), ■ ■ ■ ,Yt^{xi)] withYtj{xi) = 
where i = i + q + 1, and j = J + j for J = 1,... ,m. 

Figure 3 shows some data simulated over the space-time region [0,10] x [0,10] for Yt{x) when c = 1, A = 1 and L 
is a Gaussian basis whose seed has mean 0.2 and standard deviation 0.1. The step sizes were A„ = Au, = 0.05 and 
p = q = 300. 

4.2 Diamond grid algorithm 

Simulating the process on a square or rectangular grid requires us to approximate the triangular ambit set by blocks. 
This will affect the temporal and spatial autocorrelations which are dependent on the approximated intersection areas (see 
Section 3.2). We want to better approximate the shape of our ambit set and their intersections by simulating on a grid that 
follows the shape of the ambit set. We also want to be in the DC framework for computational efficiency. To this end, we 
consider a diamond grid (DG) built on the original rectangular grid (RG). 

We start with a motivating example. Suppose that we have the equi-spaced square grid {{xi,tj) : i = 1,... ,n,j = 
1,..., m} with n = m = 5. We want to simulate values for Yt{x) when c = 1. Figure 4(i) shows how one may construct 
a DG on this square grid. The new grid consists of diamonds with height twice the spatial spacing and width twice 
the temporal spacing. As before, we need to extend the grid by q and p units to account for boundary effects. 

With c = 1, we require that the left corner of the diamond has an upper slope of 1 and a lower slope of —1. Using 
diamonds to approximate the ambit set allows us to capture the ambit shape more accurately. However, one consequence 
is that we cannot obtain Y values for the points within the diamonds. We can only obtain Y values for points at the 
edges of the diamond. That is, our simulated data only consists of field values at {xi,tj) where i = 1,..., 5, and 
j = 1, 3, 5 if z is odd and j = 2,4 if z is even. 


12 
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Algorithm 1 Discrete convolution on a 

rectangular grid 

1 

Y rep{NA, m) 

> We will append rows to Y as xi increases to form our data matrix. 

2 

for I = 1,..., n do 


3 

y rep( 0 , m) 

> Lor each xj, we create a storage vector. 

4 

for i = 1,.. . ,2q + 1 do 


5 

j -i- I + i-l 


6 

p <— p -b convolveiWj, hj, 

type — /') > We add the convolutions of hj with TUp_|_t_i to y. 

7 

end for 


8 

Y rbind{Y,rev(yvect)) 

> Append y in reversed form so the time index increases from the left. 

9 

end for 


10 


> Remove the first dummy row of Y. 


The choice of the grid shape affects all parts of our simulation. To account for the distribution of the Levy noise within 
the diamonds, we do not simulate values for W from independent L([0, A^,] x [0, A^]) random variables. Instead, we 
scatter the Ws uniformly about the DG. If the Ws lie within diamonds, we simulate values for it from independent Levy 
random variables whose distribution is proportional to the area of the diamond. If the W s lie at the edges of the diamonds 
(denoted by the circles), we set their values to 0 . 

Next, we construct the truncated kernel matrix with the missing Y values in mind. Lor p = g = 2, we define: 


/h{u- 2 ,wo) 

0 

HU-2,w2)\ 

0 

h{u-i,wi) 

0 

h{uo,wo) 

0 

h{uo,W 2 ) 

0 

h{ui,wi) 

0 

\ h{u2,wo) 

0 

h{u2,W2) ) 


where h{u,w) = l|u|<„ exp(—Aiu). Ligure 4(ii) shows the action of the kernel matrix on the grid of Ws for the 
simulation of Yt^ (2:3). The blue shaded area denotes the approximated ambit set At^ (0:3) and the blue cross denotes its 
starting point. To correctly weigh the Levy noise in the shaded area by the kernel values, we overlay the area with the red 
dotted lines whose intersections denote the values in the truncated kernel matrix. The value for (0:3) is then found by 
multiplying the overlaid kernel values with their respective W values. 

Note that by using the DC algorithm, we will automatically generate a value for the points within the diamonds as they 
lie on the square grid. However, the alternation with the Os in the kernel matrix cancels out any values generated for these 
points. To see this, consider Yt^ (X3) and shift the red dotted lines to the right by At. Now, the non-zero kernel values are 
mutliplied with the zero-valued W s and the kernel zeros are multiplied with the non-zero W s. 

Now that we have an intuition for the DG algorithm, we can construct it. Upon further examination, we find that in order 
for steps to be easily extended to other cases, we require p and q to be even integers, n and m to be odd integers and 
Aj: = cAf. Under these conditions, we can create the algorithm for general c. 

The pseudo code in Algorithm 2 constructs the {2q -|- 1) x (p -|- 1) truncated kernel matrix: 

9 . ] exp(-Aw;j) ifi+jiseven, 

J^ij — < 

I 0 otherwise, 

where Wj — j At for j — 0^... ,p and Ui = iAj; for i — —g,..., g. This is later split to create a list of the rows of H, h: 

h = {[h (m_,, Wo) , h {u-q, Wi) {u-q, Wp)] . ,[h{uo,wo) ,h{uo,wi). ,h {uo,Wp)] , 

...,[h {Uq, Wo) , h {Uq, Wl) {Uq, Wp)]}. 

With our truncated kernel matrix, we proceed to simulate from the Levy random variables: 

— I L([0, 2cAt] X [0, At]) if/c-b / is even, 

Wki = < 

0 


13 


otherwise. 
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Figure 4: (i) Plot of the diamond grid for the case c = 1 with n = m = 5 and p = q = 2 (the Ws denote the underlying 
Levy random variables); (ii) the simulation of for the case c = 1 with n = m = 5 and p = q = 2. The 

shaded blue region denotes the approximated ambit set and the dotted red lines denote the truncated kernel matrix. The 
intersection of the lines show the kernel value to be multiplied by the W at that location. 

(0 (ii) 




tm t4 ta t 2 ti 


tm t4 t3 t2 t, 


Algorithm 2 Obtaining the truncated kernel matrix for the diamond grid algorithm 

Require: At > 0, = cAt,p, q mod 2 = 0, n, m mod 2 = 1 


1 

X seq{0, n * A^, by = A^,) 


2 

t - 1 - seg(0, m * At, by = At) 

> Construct the underlying rectangular grid. 

3 

w -1— At = 1 = seq{0,p, by = 1) 


4 

u ^ A^; * seq{-q, q, by = 1) 

> Set the coordinates of the truncated kernel matrix. 

5 

H <— matrix{0, nrow = 2 * q + l,ncol = p-\-\) 


6 

for i = l,...,2=i=g+ldo 

> We fill the matrix H. 

7 

for j = 1,... ,p -I-1 do 


8 

fc -1— i -f j 


9 

if k mod 2 = 0 then 


10 

Hij t- h{ui,'Wj) 


11 

end if 


12 

end for 


13 

end for 


14 

h ■(— split{H,row{H)) 

> We split H into a list of its rows. 


Figure 5: The finer diamond grid embedded within the rectangular grid which is represented by the red dots. Here, c = 2, 
m = 10 and n = 5. 



tl t2 ts t4 ts te t? ts tg tio 
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for fc = 1, n and I = 1,m. Denote the realisations by Wki- We create a list of row vectors of these values: 

w = {[wil, Wl2, ■ • ■,Wlrh\ , • • • , [wsi, Wn2, • • • 

All that is required now is to perform the DC of h and W by row. The steps are identical to that in the RG algorithm 
(Algorithm 1). Finally, with the data matrix V, we label Y/j where / + J is odd, as missing values. 

Instead of having missing values at every alternate space-time point, we can also obtain field values on a RG by simulating 
on a finer DG but later discarding data on alternate rows and columns. Figure 5 shows an example for c = 2. 

Remark 2. Using the DC approach to simulation incorporates a moving window and an efficient in-built algorithm. 
Alternatives include a definition-based algorithm with double summations and a cut-off point or an algorithm based on 
the Markov property. However, these involve quadratically or linearly increasing number of operations. 


4.3 Mean squared errors 

As they give discrete approximations to our continuous field, both the RG and DG algorithms involve simulation error. 

Theorem 10. Let Yfix) be a g-class OUa process, and Zf{x) be its DC approximation provided by the RG algorithm. 
Writing L' for the Levy seed of L and At (x) for the relevant ambit set (in the definition of Y), we have that: 

E [|Yt(a;) - 

= ( [ [fc(a:, f, s) —f, s)] E [L'] d^ds^ + f (fc(a;, f, s) —/i(x, f, s))^ Var [L'] d^ds, (18) 

\JmxM j ^MxM 

where k{x^ 5 ) = s) exp (—A {t — 5 )) and: 

V q 

j = Q i = -q 

We note that the proof for Theorem 10 (in the Appendix) works for any approximation of an OUa process. We only need 
to be able to formulate /i(x, t, s) appropriately. Thus, in principle, one can find the mean squared errors (MSEs) for RG 
algorithms simulating general OUa processes and DG algorithms simulating canonical OUa processes. We present the 
MSE for the latter case: 

Theorem 11. Let Yfix) be a canonical OUa process with shape parameter c, and Zfix) be its DC approximation 
provided by the DG algorithm. We have that (18) applies with: 

p j 

h{x,t,(,,s) = EE exp ( Aj At) [l(f-(j+l)At,t-jAi] ('5)lp+c(22-j)At-c(t-jAt-s),x+c(22-j)At+c(f-jAt-s))(0 

j=0i=0 

+ l(t-(j+2)At.t-(j+r)A,](s)l[x+c(2i-j-l)A,-Hc(t-jAt-s).a:-Hc(2i-j+l)A,-c(t-jA,-s))(e] • 

The MSEs in Theorems 10 and 11 are affected by two different asymptotics: At, —> 0 and Rt = pAt,Rx = 
qAx —> oo. We illustrate this in the following example, whose computational details can be found in Section 3.2 of the 
supplementary material provided in Nguyen & Veraart (2016): 
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Figure 6 : (i) MSEs of the canonical OUa simulations, as a function of the grid size for fixed kernel truncation bound 
i? = 15 (in black: the RG algorithm with rectangle width A; in red: the DG algorithm with diamond width 2A; and in 
blue: the DG algorithm with diamond width A); (ii) MSEs as a function of R for fixed grid size 0.05; (iii) MSEs as a 
function of the grid size for R = 0.05/A;(iv) the corresponding three simulation grids and the definition of ’’grid size” 
in each case (in black: the RG; in dotted red: the DG; and in blue: the finer DG). Here, \ = c = 1, and the mean and 
standard deviation of the Levy seed are 0.2 and 0.1 respectively. The coloured dotted lines refer to the respective non-zero 
asymptotic limits when A —> 0 in Plot (i) and i? —> oo in Plot (ii). Since the algorithms share the same limit for the first 
case, only one limit is shown. 


(i) 


(ii) 




(iii) 



(iv) 



<-> 

Grid Size 


Example 8. Let Y-t{x) be the canonical process with c = 1, and let the mean and variance of the Levy seed be p. and 
respectively. Fix p = q. At = = A and R = pA. By using series identities and expansions, we find that for the RG 

algorithm: 


4p'^ 


E[\Yt{x)-Zt{xf^= ^{l + \Rf + ^{l + 2XR) + 0{A) for fixed R, 


andE |^|■Kt(a:) - ^ | 1 - 


A^A^ 


(1 — 


1 -I- e 


-AA-| 


+ T^ 


2A2 


_L llAfA)-. 


AA 


4A2A2e-2^2X 

(l_g-2AA)2 


„-AA 


A 2 


A 


V(A) 


2AA 


1 — e“2AA 
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In a similar way, we find that the DG MSE and its limits are: 


E [\Yt{x) 
E [\Yt{x) 


Zt{x)\^] 


^(l + RfX^ + ^(l + 2\R) 


e -|- 0{A) for fixed R, and 





4A2A2 

(l_g-2AA)2 


When A = 1, the MSEs of the RG and DG algorithms converge to the same limit with the same order A when A —> 0 
(here, the Big O notation has been used). Due to the infinite series in A and R, it is difficult to compare the MSEs 
analytically. Instead, we compute them numerically and plot the results for two scenarios: decreasing grid size with 
R = 15 and increasing R with grid size equal to 0.05. For both cases, we set p, = 0.2 and t = 0.1. 

From Figure 6(i), we see that the MSEs of both the RG and the DG algorithms decrease to their common limit when the 
grid size decreases. So far, A for the DG algorithm refers to the grid size of the underlying RG. If one defines the DG 
grid size as A, the diamonds would have width 2 A (see the red dotted lines in Figure 6(iv)). In this case, the MSE of the 
DG algorithm is larger than that of the RG algorithm for grid sizes more than 0.02, but less than that of the RG algorithm 
for grid sizes less than 0.02. It seems that, when the grid size decreases beyond 0.02, the ambit set approximation error 
of the RG decreases more slowly than the kernel discretisation error of this DG. If instead, one defines the DG grid size 
to be the distance to the nearest spatial or temporal neighbour, the DG with the same grid size as the RG in Figure 6(iv) 
would be the one in blue. In this case, the MSE of the DG algorithm is consistently lower than that of the RG algorithm 
as can be seen from Figure 6(i). 

Figure 6(ii) shows the behaviour of the MSEs when R increases from 3 to 15. Instead of a monotonic decrease as R 
increases, we observe that the MSEs for both the RG and the DG algorithms drop sharply before slowly rising to their 
asymptotic limits (denoted by the dotted lines in their corresponding colours). This illustrates the interaction between 
kernel truncation and kernel discretisation errors. When we fix the grid size but increase R, kernel discretisation error 
increases as more terms are involved but kernel truncation error decreases. Both changes occur at decreasing rates due 
to the exponential decay of our kernel function. Eventually, these changes become insignificant and MSEs level out when 
R is large enough (around 10 here). 

Based on our computations, the RG and DG MSEs converge to zero only if A —> 0 and R —> oo. Ifp = [l/A^] so that 
R oc 1/A and i? —> cx) fli A —> 0, the MSEs in both cases converge to zero at the order A. (Equivalently, we can fix 
A oc 1/i? and let R —> oo.) This can be seen by replacing the Rs in the MSE expressions by K/A for some constant K. 
In Figure 6(iii), we illustrate the RG and DG MSEs for R = 0.05/A, X = 1, p = 0.2 and t = 0.1. In this plot of the 
MSEs against grid size, the MSEs of the DG algorithms are consistently lower than that of the RG algorithm. This arose 
from our choice of 0.05 for the proportionality constant between R and A. If instead we choose this to be 1, the R values 
for A < 0.05 would be greater than 20. Thus, there will be an insignificant change in MSE due to the increasing R. 
Instead, the MSE behaviour will be dominated by that caused by the change in grid size and the resulting plot will look 
very similar to Figure 6(i). 


5 Inference 

In this section, we focus on inference techniques for the canonical OUa process Yt(x) defined by (8). 

5.1 Moment-matching inference 

We consider a spatio-temporal extension of the moments-matching (MM) method in Jonsdottir et al. (2013). This involves 
matching the theoretical forms and the empirical values for the normalised spatial and temporal variograms, as well as the 
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cumulants of Yt(x). The normalised spatial and temporal variograms are defined as: 


:= 

and := 


E [(Ft {x) -Yt{x- d,)f 
Var [Yt (x)] 

E [(Ft {x) - Ft_,. (x))^] 

Var [Ft (x)] " 


= 2 (l-p(®)(d,)) = 2 

2 (l-p(^)(dt)) = 2 (l 


— exp 



- exp(-A(it)). 


(19) 

( 20 ) 


Suppose we have data {Fv : v 6 V} where V is the set of space-time observation sites. Let N{dx,dt) be the set 
containing all the pairs of indices of sites with spatial distance d^ > 0 and temporal distance dt > 0 , and let K 2 (Vv) 
denote the empirical variance in our observations. We can estimate yl’®) {dx) and 7 )^! (dt) by: 


¥^Hdx) = 


\N{dx,0) 


(i,j)eN(d^,o) 


(Vvt - Vv 
K 2 (Fv) 


and = 


|V(0,dt 


(z,3)eN{S),dt) 


(Vvt-Vvf)^ 

K2(Fv) 


( 21 ) 


By matching the empirical and the theoretical forms, we can estimate A and c by A = — dt ^ log 
c = -Xdx/log {l- 7 ^'^^ (dx)/2). 

Having found A and c, we use the cumulants of Ft(a;) to estimate the parameters of the Levy basis L. For I = 1, 2,3, and 
4, we can estimate ki (Yt (x)) non-parametrically by (see for example, page 391 of Kendall et al. (1987)): 

K,(Yt(x)) = K 2 (Yt(x)) = (DS 2 - Sf), « 3 (Vt(x)) = - 3 DS 2 S 1 + 25f), 

and K 4 (Yt(x)) = - 4(D^ + d9)535i - 3(D^ - D)Sl + I 2 DS 2 S! - 65^], 

where D denotes our sample size and Sk = ■ Now, we can match the estimates with the theoretical representa¬ 

tions and solve for the basis parameters. We give an example for the Gaussian basis: 


1 - 




and 


Example 9. Let L be a Gaussian basis whose seed has mean /i and variance t^. With A = A and c = c, we obtain; 


- ., 2 c Ki(Ft(a;))A^ 

«i(Ft(a:)) = ^ p = 


and K 2 (Ft(x)) = t'^^— => f = 
2X^ 


2K2(Vt(x))A2 


5.2 A least-squares adaptation 

The MM method introduced in the previous subsection involved using the normalised variograms at single lags to find A 
and c. Instead, we can compute (19) and (20) at several lags and use least-squares (LS) to fit the values to the theoretical 
curves. By fitting the empirical temporal variogram to its theoretical form in (20), we can find A. With this, we can find c 
by fitting the empirical spatial variogram to the expression in (19). We can then proceed as in the MM method to find the 
Levy basis parameters. This approach is useful for the cases where we have more than two parameters in the normalised 
variograms such as case (ii) in Example 1. 

The LS methodology can be considered as a variant of the Generalised Method of Moments (GMM). These two methods 
are similar in that they minimise error functions involving “moment” conditions. However, there is a key difference. In the 
LS method, we used normalised variograms to separate the effects of the Levy basis parameters and the autocorrelation 
parameters (A and c). We first estimated A and c before plugging our estimates into the expressions for the cumulants of 
our process to estimate the basis parameters. In a GMM framework, however, we cannot use normalised variograms or 
cumulants as we have to work with ergodic averages of functions computed from our data. This means that we cannot 
separate the effects of the basis and autocorrelation parameters. Instead, we have to combine all the “moment” conditions 
involving variograms and moments of our data into one optimisation criterion and find all the estimates simultaneously. 
High computational effort is required to navigate through the high dimensional surface of this optimisation criterion. More 
work is required to see how a GMM approach can be applied to OUa processes. 
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Figure 7: Theoretical and empirical curves of: (i) the normalised temporal variogram; (ii) and the normalised spatial 
variogram. The black curves denote the theoretical curves, while the red and blue curves denote the moments-matching 
fit and the least-squares fit using 15 lags respectively. Each lag corresponds to 0.05 units. 
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d, dx 

5.3 Properties of our estimators 


Theorem 12. Suppose that we have lattice data, {Y^. : = {xj,tk),k = 1,... ,m, j = 1,... ,n}, for a canonical 0\J/\ 

process. Then, the MM and LS estimators for the model parameters are consistent under increasing domain asymptotics. 

The proof of this asymptotic property can be found in the Appendix. To investigate properties of our estimators under 
finite samples, we can conduct simulation experiments. For illustrative purposes, we assume a Gaussian basis and fix 
A = c = 1. We use the RG algorithm with At = = 0.05 and p = q = 300 to generate 500 data sets covering the 

[0,10] X [0,10] space-time region. Similarly, we use the DG algorithm (with diamond width 2At) to generate 500 data 
sets. For both cases, each dataset took about 27 seconds to create on a PC with characteristics: Intel® Core™i7-3770 
CPU Processor @ 3.40GHz; 8GB of RAM; Windows 8.1 64-bit. 

We obtain the MM and LS estimates of the 500 RG and DG datasets. For the LS procedure, we consider 15 time and 
space lags for the normalised variograms, each lag being 0.05 units. For each lag, these were calculated using (21). 
Figure 7 shows the LS fit of the theoretical forms, (19) and (20), to the empirical values in blue for one RG dataset. From 
the fitted normalised temporal variogram, we obtain A = 1.037; from the fitted normalised spatial variogram, we obtain 
A/c = 1.090 so that c = 0.951. 

The results for the 500 RG datasets are shown in the first two columns of Figure 8. The red lines in the plots denote the 
true values of the parameters. We notice that while the MM estimates of c in Plot (f) always lie below the true value 1, 
the LS estimates of c in Plot (e) are centred around 1. This can be explained as follows: unlike the MM method which 
only fits the normalised variogram at the first lag, the LS method fits 15 lags. This has the effect of pulling the estimated 
variograms closer to the true curves in most cases such as that in Figure 7, where the blue curve lies closer to the black 
than the red. 

This increase in accuracy, however, comes at the expense of higher standard deviations. While the MM estimates for c in 
Plot (f) have a range of about 0.03, the LS estimates have a range of about 0.5. Such an increase in standard deviation is 
also seen for the DG data sets whose results are shown in the last two columns of Figure 8. While the MM estimates for 
c in Plot (h) have a range of about 0.07, the LS estimates in Plot (g) have a range of about 0.9. As a result of using the 
estimated normalised variogram at multiple lags, there are more sources of variation in the LS method. In addition, the 
estimated normalised variograms at higher lags incur larger standard deviations due to the smaller proportion of data used 
for their calculation. 

We have seen that despite their inherent errors, our DC algorithms are useful for revealing properties of our MM and LS 
estimators. In the next subsection, we use our inference methods to tell us more about our simulation algorithms. 
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Figure 8: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 1 and L being a Gaussian basis. The true values are denoted by the red lines. 
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5.4 Investigating ambit set approximations in onr algorithms 

For fixed At, the DG algorithm (with diamond width 2A) exhibits larger kernel discretisation error than the RG algorithm. 
However, by construction, it approximates the ambit sets of the canonical OUa processes better. This results in a better 
representation of the autocorrelations and variograms of the process. We use the MM results from the 500 RG and DG 
data sets to illustrate this. 

From the box plots of the MM estimates of the RG data in the second column of Figure 8, we see that reasonable results 
are obtained for all parameters except c. There seems to be a consistent underestimation of c. In comparison, reasonable 
results are obtained for all parameters including c in the last column of Figure 8 where the MM method is applied to the 
DG data. As c is obtained directly from the estimated spatial normalised variogram, this implies that mimicking the edges 
of the triangular ambit sets of the canonical OUa process through DGs instead of approximating it by blocks through 
RGs, allows us to better capture the spatio-temporal dependencies of our process. 

The better ambit set approximation of the DG algorithm, as well as the bias-variance trade-off of the FS method mentioned 
in the previous subsection, were also observed in cases where c was set to 2 or 0.5 and L to IG, NIG or Gamma bases. 
The associated box plots can be found in Section 4 of the supplementary material provided in Nguyen & Veraart (2016). 


6 Application to radiation anomaly data 

Space-time ARMAs have been used for satellite ozone data and suggested for other global coverage data including those 
on cloudiness (see for example, Niu & Stein (1992)). Since OUa processes can be viewed as continuous extensions of 
space-time ARMAs and cloud cover can be inferred from outgoing longwave radiation (OLR), we apply OUa processes 
to the radiation anomaly data provided by the International Research Institute for Climate and Society (2015). 

6.1 Fitting the canonical model 

The spatial component X of our data is longitude in degrees east (°E) with X = {1.25, 3.75,..., 356.25, 358.75}. The 
temporal component T consists of sets of five days starting from 1-5 January 2014 to 18-22 September 2014 so that 
T = {1, 2,..., 52, 53}. In total, we have 144 x 53 = 7632 observations of five-day average OFR anomalies averaged 
over the latitudes 5 degrees south to 5 degrees north, measured in watts per square metre. These anomalies are calculated 
with respect to the base period 1979-1995 and capture the changes in precipitation patterns over the time. Specifically, 
negative anomalies imply increased cloudiness and an enhanced likelihood of precipitation, and positive anomalies imply 
decreased cloudiness and a lower chance of precipitation. As this relationship between OLR anomalies and precipitation 
is most reliable in tropical regions, our data only covers the latitudes near the equator. A heat map of our data can be 
found in Figures 10(i) and ll(i). Since no deterministic trend or seasonal component was found, we proceed to fit the 
canonical model to the data. 

Autocorrelation structures are a key feature in space-time ARMAs and OUa processes. Figures 12 and 13 show the 
time series and ACF plots of our data at three spatial points, and the spatial series and ACF plots at three temporal 
points respectively. In the ACF plots, the red curves represent the fitted ACF from the MM estimation while the blue 
curves represent the fitted ACF from the LS adaptation using 15 lags. The fits seem rather similar for the temporal series 
in Figure 12. Indeed, the estimates obtained for the rate parameter A from the MM and LS methods are quite close, 
1.081 and 1.033 respectively. In comparison, the LS fit in Figure 13 seems to be much better than that of the MM fit. 
By considering more lags for the normalised spatial variogram, we have c = 14.819 instead of 31.504, a substantial 
difference. The LS approach may be preferred in this setting as we are aiming to fit the empirical autocorrelations well. 
These values of A and c are important as they tell us how much weight to give to the information on neighbouring locations 
and which neighbouring space-time locations to consider. 
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Figure 9: (i) Empirical densities of the radiation anomaly data (in black) and the simulated data sets combined (in red for 
the Gaussian basis and blue for the NIG basis); (ii) log densities of the same data and simulated data sets combined. 
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Figure 10: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data with random seed 1. 
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Figure 11: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data with random seed 1. 
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Figure 12: Properties in time for fixed spatial locations of the radiation anomaly data. Time series and ACF plots for: (a)- 
(b) Y’t(1.25); (c)-(d) 1^4(118.75); and (e)-(f) Y't(238.75). From our definition of T, the time lags are the index differences 
between the pentads considered. In the ACF plots, the black vertical bars denote the empirical autocorrelation values, 
while the red and blue curves represent the ACFs arising from the moments-matching estimation and the least-squares 
approach respectively. 
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Figure 13: Properties in space for fixed temporal locations of the radiation anomaly data. Spatial series and ACE plots for: 
(a)-(b) Yi{x); (c)-(d) Y 2 e{x)', (e)-(f) p 53 (x). Each spatial lag is of 2.5°E. In the ACE plots, the black vertical bars denote 
the empirical autocorrelation values, while the red and blue curves represent the ACEs arising from the moments-matching 
estimation and the least-squares approach respectively. 
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We continue with the LS estimates for A and c, and calculate the empirical cumulants of the Levy seed, Since we 

have both positive and negative anomalies, we consider the Gaussian and NIG models. For the Gaussian basis, we have 
jl = -0.00598 and f = 5.827; for the NIG basis, we have a = 0.0765, /3 = -0.0260, 5 = 2.161 and /t = 0.775. 

In order to assess the goodness-of-fit of the marginal distribution, we simulate from the fitted models using the DG 
algorithm. We set Aj = 0.1, = cAj, n = 219, m = 521, p = 300 and q = 30. Figure 9 shows the densities and 

the logarithms of the densities computed from the data (in black), 100 simulations of the fitted Gaussian model (in red) 
and 100 simulations of the fitted NIG model (in blue). From Figure 9, the model with the NIG basis seems to do a better 
job at capturing the peak and tail behaviour of the process. Note that comparing the extreme tail behaviour is not very 
informative as our OLR data does not contain many observations with extreme values. 

Using the DG algorithm, we can simulate from our fitted models to compare the data generated to the original. The 
heat plots of the simulated data from one run are placed alongside the heat plots of the OLR data in Figures 10 and 11. 
Note that alternate coordinates of the simulated data were used to create these plots so as to avoid the missing values. 
Although we cannot judge the appropriateness of the fitted models based on individual runs, insight can be obtained 
from the comparison. As diagonal lines seem to outline diamond-like structures in the simulated data, one can say that 
the fitted canonical OUa processes give linear approximations to the spatio-temporal dependencies observed in the OLR 
phenomena. The simulated data also exhibit clusters of high and low values, a key feature of the original data. The heat 
plots from nine other simulation runs can be found in Section 5 of the supplementary material, Nguyen & Veraart (2016). 

6.2 Prediction under the Gaussian assumption 

In the case of the Gaussian canonical OUa process, predictive distributions at new space-time locations can be computed: 

Theorem 13. Let Yt{x) be a Gaussian canonical OUa process with rate parameter A, shape parameter c, and whose 
Levy seed L' has mean and standard deviation p and t. Suppose that we have n observations at different space-time 
locations {Yj = Yt^{xi) : i = 1,... ,n}, and we want to predict Yq = Ytg{xo), the field value at a new space-time 
location. 

Let Y = (Yi,..., Yjj) and v = {x, t). We define ry (uoi v) to be the 1 x m correlation vector between Y (vq) and Y and 
let Ry to be the m x m correlation matrix for Y such that the covariance matrix of {Yq, Y).' 

2A2 yry(vo,v)'^ Ry j 

With 1„ denoting the n-dimensional column vector of ones, the predictive distribution ofYfY is N{p*, E*) where: 

p* = E[Yo|Y] = ‘^+ry{vo,Y)Rffi ^Y - . andT.* = Var[Yo|Y] = (l - ry( uq, v)i?“Vy( vq, v)"^) . 

p* and E* can be seen as the best predictor and its prediction error variance respectively. 

Proof. From Example 3, we have an explicit expression for the JCGF of (Yq, Y): 

(7{0t(Yo,Y)} ^ ^ min ^exp (-A|U - |), exp ^ 

i=0 i,j=0 ^ ^ 

This means that (Yg, Y) ~ N{p, E) where p = 2c/rA“^l„+i and E is a positive-semidefinite and symmetric covariance 
matrix with E^- = cff (2A^) ^ min (exp (—A|U — tff), exp {—X\xi — xfi/c)). By using conditioning arguments on a 
multivariate normal distribution, we obtain the desired result. □ 

In the preceding analysis, the OUa process was used directly as a spatio-temporal model. Alternatively, in line with its 
origins, it can be used as a model for spatio-temporal stochastic volatility in other modelling frameworks. For example, in 
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the setting of a dynamic Cox process, it can be used, in place of the spatio-temporal OU process in Brix & Diggle (2001), 
to model the intensity process. Slightly modified versions of OU^ processes have also been used in multiplicative models 
for turbulent casades (Barndorff-Nielsen & Schmiegel 2004, Schmiegel et al. 2005, Hedevang & Schmiegel 2013). These 
exponential spatio-temporal OU models have been shown to reproduce the scaling behaviour of energy dissipation. 


7 Conclusions and Outlook 

The spatio-temporal Ornstein-Uhlenbeck (OU) process which we introduced and referred to as the OUa process, is an 
example of how we can extend models in time to space-time and yet retain desirable features of the original model. In 
Section 3, we have shown that the properties of stationarity, exponentially decaying ACEs and ergodicity transfer from the 
original temporal OU process to the OUa process. OU and OUa processes can be further linked through their equality in 
law when the spatial parameters are fixed. Time-changing, which has been done in the study of OU processes to remove 
the distributional dependence on the rate parameter A, can also be applied to OUa processes. The additional spatial 
dimension of the OUa process, however, makes it more flexible than the OU process. Thus, there is more than one way 
to establish a time change. The greater flexibility of OUa processes is also seen in our ability to obtain different spatial 
autocorrelations by choosing different ambit sets. 

In the second part of our study, we tackled the problems of simulation and inference. We created two simulation algorithms 
for the canonical OUa process: one generating values on a rectangular grid (RG) and the other generating values on 
a diamond grid (DG). Concurrently, using the property of stationarity and explicit forms for the autocorrelations, we 
developed a moments-matching (MM) estimation method for OUa processes. A least-squares (LS) extension where the 
normalised variograms were evaluated at more than one lag to estimate the rate and shape parameters (A and c) was also 
considered. 

In our analysis, we showed that the mean squared errors (MSEs) of our simulation algorithms decreased with grid size 
as expected. However, the errors only converge to zero if the kernel truncation range simultaneously increases to infinity. 
It was also seen that the DG algorithm consistently incurs less MSE than the RG algorithm if we define the grid size 
to be the distance to the nearest spatial or temporal neighbour. The superiority of the DG algorithm in simulating the 
canonical process is also seen through its ability to reflect the spatio-temporal dependencies more accurately. Reasonable 
results for c, which determines the spatial autocorrelation structure, were obtained from both inference methods for the 
data generated on a DG. On the other hand, there was consistent underestimation of c when we applied the MM method 
to data generated on the RG. In our study, we chose the DG algorithm to mimic the edges of the triangular ambit sets. It 
may be trickier to find an appropriate grid shape for more general OUa processes. 

The MM and LS inference methods, which we developed, have been proven to result in consistent parameter estimates. 
In our simulation study, it was also shown that the LS adaptation exhibits a bias-variance trade-off. This is obvious in the 
case of the RG data as the c estimates obtained were centred about their true values but with higher standard deviations. 
After evaluating our simulation and inference techniques, we looked at how OUa processes can be used in practice. As 
an empirical illustration, we applied our estimation methods to radiation anomaly data from the International Research 
Institute for Climate and Society. Good fits of the ACEs were obtained using the LS approach. This suggests that despite 
the bias-variance trade-off, the LS adaptation of the MM method may be preferred over the original in an empirical 
context. On another note, the good fits of the ACEs highlighted the relevance of our class of OUa processes for empirical 
work and the use of such processes as modelling tools. Explicit formulae for the best predictor and the prediction error 
variance in the case of a Gaussian canonical OUa process are also provided. 

In this paper, we studied inference methods based on moments of the OUa process. One topic for future research would 
be to design other inference techniques. In Example 3, we worked out the explicit form of the joint cumulant generating 
function for points of our canonical OUa process when the Levy basis is Gaussian. This can be used to identify a 
multivariate normal distribution and its likelihood. Thus, we can conduct maximum likelihood estimation. If no further 
considerations are made, the optimisation procedure would involve inverting high dimensional matrices and would be 
computationally expensive. It would be interesting to develop ways to bypass these operations through, for example. 
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pair-wise likelihood or composite likelihoods. Pseudo-likelihood methods can also be considered for non-Gaussian bases. 
Another way forward from our research would be to study the theory, simulation and inference techniques for supOU^ 
processes. These processes are obtained by randomising the memory parameter in the OUa process, and would allow us 
to model long memory instead of exponentially decaying autocorrelation. 


Supplementary Material Supplementary material for this paper can be found in Nguyen & Veraart (2016). This 
contains a summary of the integration theory in Rajput & Rosinski (1989), the details behind Examples 2 and 8, as 
well as additional results from our simulation and empirical studies. 
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Appendix: Proofs of Theorems 


Proof of Theorem 4. We first show that the given conditions make Y't(x) = Zt for fixed t and x. From Theorem 9, we 
know that the LK triplet of Yjjx) is (ay, 6y, vy) such that ay = [a — dudru, by = 

be^^™dudw and r'y(dj/) = i'(e“^™dt/)dudw. Similarly, one can show that the LK triplet of Zt is (a^, hz, vz) 

where az = dA“^ — zli^\z\<e->"^v{dz)dw, bz = b(2A)“^ and vz{Ay) = v{e~^'^dy)dw. The given 

conditions on b, b and P imply that ay = az, by = bz and vy = vz- Thus, Tt(x) = Zt. 

Next, we prove that the chosen P is a valid Levy measure, i.e. min(l, z^)i'(dz) < oo. Since yt(x) is a well-defined 
OUa process: 

oo>2\ [ min(l, 2 ^)^y (dz) = 2A f [ min(l, z^)i>(e“^’"d 2 )dw 

Jm. J — oo Jwl 

> 2X f f min(exp(2Aui), 2 ^)i>(e“^“d 2 )dw; = f min(l, r/^)i>(dj/), 

J — OO Jm. Jm. 

where y = exp{—Xw)z and the first equality holds via Fubini’s theorem. 

We now show that Yt{x.) = Zt as, & process in t for fixed x. Let n 6 N and let t\ K. • • • <i. tji denote arbitrary time points. 
Since both processes {Y'j(x)}tgR and {Zt}tev. are Markov, their finite dimensional distributions can be written as: 

n 

/(lj,(x),...,yt„(x)) = n/(yt,(x)|lj._,(x)) x/(lj,(x)), (22) 

i=2 
n 

arxdf{Zt,,...,Zt^) = \{f{Zt,\Zu_,) x f (Zt,). (23) 

i=2 

Here, / (TtJTtJ denotes the conditional distribution of Yt, given Yt,_, and / (YjJ denotes the marginal distribution of 
Yt,. Analogous notation has been used for Zt- We already know that Yt,(x) = Zt,. So, to show that the functions 
(22) and (23) are the same, we need to prove that the conditional distributions Yt,\Yt,_, and Zt,\Zt,_, are the same for 
arbitrary *6(2,..., n}. 

To motivate this, we work with Yt„(x.) and yjj(x). From Lemma 1, Yt„(yi) = e~^^*'^~*'^'^Yt,(yi) + U(ti,t 2 ,x) where 
U{ti,t 2 ,x) = (x)\At (x) ds) is independent of (x). Similarly, we find that Zt^ = (x)-|- 
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V{ti^t 2 ) where V{ti^t 2 ) = e ®^L(ds) is independent of Zt^. Since = Zt^ and (x) = it follows 

thatC/(fi,f 2 ,x) = V{h,t 2 ) andFtJFt, = Zt,\Zt^. The proof is completed by repeating the same argument for the 
remaining conditional distributions. □ 

Proof of Theorem 5. From Theorem 9, we have: 

/ t /*x+c|i—s|^ 

/ C{6» exp(-A(f - s)) t L'}A^+MCds 

-OO j X — c\t — s\'i 

2rit}^ / 1 \ 

= j C{dexp{—w) j: L'}A^~*~^ (- J dm, by using w = X{t — s), 

2cw‘^C{6 exp(—w) L'}dw, which does not depend on A. (24) 

□ 



Proof of Theorem 6. We have: 

c{etY,{x)}= f f 

J —OO J X 


t /■a:-|-g(A|t-s|) 


— OO —5f(A|t—s|) 


C'{6lexp(—A(f — s)) t L'jAd^ds 


j 2g{w)C{6eyiY>{—w) t L'}A dw, by using w = X{t — s), 

/ OO 

2gr(u')C’{6lexp(—w) if L'}dw, which does not depend on A. 


(25) 

□ 


Proof of Theorem 7. This proof is similar to that of Theorem 4. From the proofs of Theorems 5 and 6 , C{9 if y’/(a;)} = 
2cw'^C{9 exp {—w) t L'i}dui and C{9 t yt^(a:)} = /g°° 2g{w)C{0 exp {—w) J Lajdru respectively. The conditions 
for (di, bi,Di) and ( 02 , 62 , ^' 2 ) are found by equating the FK triplets of Y^^{x) and Z^, and those of Y^{x) and Z^. In 
both cases, the arguments in the proof of Theorem 4 can be used to establish the validity of the chosen Levy measure and 
extend the equality in law for fixed t to equality in law as processes. □ 

Proof of Theorem 8. A one-dimensional adaptation of Theorem 3.5 in Fuchs & Stelzer (2013) says that the mixed moving 
average Xt := fg fg /(C, t — s)L(ds, d^) is mixing. Here, L is a real-valued Levy basis on S' x R, 6 S' is a varying 
parameter and / :SxR—>Risa measurable function. Using f = fx for fixed x: 

Yt{x) = [ [ lAt{x){^,s)exp{-\{t - s))L{ds,d^) = [ [ fx{C,t - s)L{ds,d(;), 

JR Jr JrJs 

where S = R, C = a: - C, and fx{C,t- s) = lt-s>o'i—g(t-s)<c<g(t-s) exp(-A(f - s)). So, is mixing. As 

mixing implies ergodicity, the process is ergodic (Fuchs & Stelzer 2013). 

Since x can act in place of t, we can also use f = ft for fixed t to obtain: 

Yt(x)= [ [ lApx)i^,s)exp{-X(t-s))L{ds,d^) = f j ft{C,,x - i)L{di,dC), 

Jr Jr Jr Js 

is mixing. Now, S = R, C = f - s, and /t(C, x - C) = lc>ol- 3 (C)<rr- 5 <g(C) exp(-AC). □ 

Proof of Theorems 10 and 11. Using the DC algorithm, simulating from Z is the same as simulating from 

/rxK ds) since L is a homogeneous Levy basis. We write f = k — h. From the Rajput & Rosinski 
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(1989) theory (via the CGF), if the stochastic integral of / over E x R is well-defined, then: 



□ 


Proof of Theorem 12. From the ergodicity of {y’t(a;)}tgR and the stationarity of the process, Ki(Y((a:)) = 

E [^((a:)] = Ki{Yt{x)) as D — > cx). Similarly, we can show that all the empirical cumulants are consistent. 

Now, let {Zt}tev. = g{{Yt{x), Yt-dt{x), • • • }) where g : R°° —> R is a measurable function. It can be shown that 
{Zt}tev. is an ergodic process (see for example. Theorem 36.4 on page 495 of Billingsley (1995)). Since %{dt) = 
(x) — Yt^{x))^ is a measurable function of (x),... }, and Yt{x) is stationary, 

%{dt) ^ l^'^\df) asm ^ oo. Thus, f^'^\dt) = (dt) ^ l^'^\dt). Fikewise, A- "/^^^da:) 

as n —> cx). Under increasing domain asymptotics where the number of spatial and temporal pairs increase, A 

'y'^'^^dt) and7(-5)(d^) 4 'y^^'>{da:). 

Using Slutsky’s theorem and the consistency of K 2 , the empirical normalised variograms are consistent. We deduce that 
the MM estimators of A, c, and the distributional parameters are consistent via the Continuous Mapping Theorem. 

Once we have the consistency of our empirical normalised variograms, we use an edited version of Theorem 3.1 on page 
70 of Fahiri et al. (2002) to deduce that all our LS estimators are consistent. Specifically, we replace the empirical and 
theoretical variograms in the Theorem by their normalised forms. Since Condition (C.l) obviously holds while Conditions 
(C.2) and (C.3) hold from the exponential structure of our normalised variograms and the choice of the identity matrix of 
the weight matrix, all the arguments in the proof apply. □ 
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1 Introduction 

This document provides the readers of Nguyen & Veraart (2016) with additional information on the theoretical background 
of the spatio-temporal Ornstein-Uhlenbeck process, computational details for Examples 2 and 8, as well as additional 
simulation and empirical results. 

In Nguyen &. Veraart (2016), we introduced the OUa process which can be interpreted as a spatio-temporal Ornstein- 
Uhlenbeck process. This is defined as a random field, lt(x), in space-time S = X x T such that: 

Yt{x) = [ exp(-A(f - s))L(dCds), (1) 

J At(x) 

where A > 0, L is a homogeneous Levy basis and j4t(x) = rio(O) -|- (x, t) is a subset of S satisfying: 

yls(x) C rit(x), V s < f, and j4t(x) n (Af x {t, cx))) = 0. 

We stated that the process is well-defined if the integral exists in the sense of Rajput & Rosinski (1989). In Section 2, we 
summarise the key concepts of this integration theory and present the conditions for OUa integrals to exist. Additionally, 
we prove that the processes we worked with in our simulations, i.e. the canonical OUa processes given by: 

Yt{x)=f f exp(-A(f - s))L(dCds), (2) 

•/—oo Jx — c\t—s\ 

where c > 0, and L is either a homogeneous Gaussian, inverse Gaussian (IG), normal inverse Gaussian (NIG) or Gamma 
basis, are well-defined. 

In Section 3, we provide the full details behind Examples 2 and 8 in Nguyen & Veraart (2016). These address the spatio- 
temporal autocorrelation structure of the canonical OUa process and the mean squared errors (MSEs) of the discrete 
convolution simulation algorithms respectively. 

In Section 5.3 and 5.4 of Nguyen & Veraart (2016), we presented the inference results of the data simulated from the 
canonical OUa process with L Gaussian and c = 1. We then mentioned that similar features were observed when we 
varied the Levy basis and the value of c. In Section 4 of this document, we display the rest of the results from applying the 
moments-matching (MM) estimation method and its least-squares (LS) adaptation to data simulated using the rectangular 
grid (RG) and diamond grid (DG) algorithms. 

In Section 6 of Nguyen & Veraart (2016), we conducted an empirical study of radiation anomaly data (International 
Research Institute for Climate and Society 2015). Using the LS inference method, we fitted two canonical OUa processes 
to the data: one with a Gaussian basis and the other with a NIG basis. We also displayed heat plots of data generated from 
these fitted models. More plots from different data runs are given in Section 5 of this document. 
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2 Stochastic integration theory and the existence of the OUa processes 


The Cq theory in Rajput & Rosinski (1989) provides us with the definition and existence conditions for stochastic inte¬ 
grals involving deterministic integrands and general Levy bases. Since we only deal with homogeneous Levy bases in 
OUa processes, we formulate our summary of the theory in this setting. Let S be the Borel cr-algebra on our space-time 
S' = R'* X E for d 6 N and Bb{S) = {E 6 S : \d+i{E) < cx)} where A^+i denotes the Lebesgue measure on B(R'*"'"^). 
We work in the probability space (fl, E, P). As is commonly done, we start with the integral of a simple function: 

Definition 1 (Stochastic integral of a simple function). 

Let {yj : j = 1, ..., m} be a set of real numbers, and {Ej : j = 1, ..., m} be a collection of disjoint sets ofBb{S). Then, 
/(x, t) = Vj 1 Ej is called a simple function on S. Here, Iej = ^ if (Xj t) £ Ej and 0 otherwise. 

We define the stochastic integral of f over A E S by fL(d^, ds) = I"! ^j)- 

Now, we extend the definition to measurable functions: 


Definition 2 (L-integrable functions, integrable functions and their stochastic integrals). 

A measurable function f : {S,S) —> (R,B(R)) is said to be L-integrable if there exists a sequence {/„} of simple 
functions such that: 

(i) fn^f almost everywhere with respect to L. 

(ii) for every A 6 S, the sequence /„L(d^, ds)} converges in probability as n ^ oo. 

If f is L-integrable, we write: 

[ /L(d^,ds) = P- lim / /„L(dC,ds). (3) 

The construction is well-defined as the limit does not depend on {/„}. 

If we are just interested in the integral of f over one set, A 6 S say, we can simplify this idea. We call f integrable over 
A if (i) holds, and the sequence f„L{d^, ds) converges in probability as n ^ oo. Similar to before, we define the 
required stochastic integral via (3). 

We have seen that the stochastic integral is defined as a limit in probability. Theorem 2.7 in Rajput & Rosinski (1989) 
provides us with explicit conditions for the existence of this limit: 


Theorem 1. Let L be a homogeneous Levy basis on (S', S) and let the Levy-Khintchine triplet of its seed be (a, b, u). The 
measurable function f : (S, S) —> (R, B(R)) is L-integrable if and only if: 


[ |C^(/(C,s))|d^ds < CX), / &|/(C,s)pdCds < CX), and ( Vo{f{^,s))d^ds<oo, 

Js Js Js 

where U{u) = ua-\- / (pjzzi) — ttp( 2 ))ic(dz), pjz) = 2 l| 2 [<i, and Lo(w) = / min(l, | 2 u|^)!^(d 2 ). 

Jm. ^ 7 r 

By virtue of the proof to the theorem in Rajput & Rosinski (1989), we can check that a stochastic integral over a chosen 
set A 6 S exists by evaluating the same conditions over A instead of S. 


Since the Levy bases we work with in our simulation studies (i.e. the homogeneous Gaussian, IG, Gamma and NIG bases) 
have finite second moments, the C 2 integration theory in Walsh (1986) tells us that our OUa processes are well-defined. 
As C 2 convergence implies convergence in probability, Rajput and Rosinski’s integration criteria should be all met. We 
check that this is true for the canonical processes. Let A = At(x) = {(^, s) : s < t,x — c\t — s\ < ^ < a: -f c|f — s|}. 
Then, the conditions we require are: 

/ \ua-\- (2u1[2„|<i - 2u1|2|<i) !a(d2)jd^ds < CX), (4) 

J A Jr 
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(5) 


/ 6exp(—2A(t — s))d^ds < oo, 
Ja 

and / / min(l, | 2 up)!^(d 2 )d^d. 

Ja Jr 


s < oo. 


where u = exp(—A(t — s)). 

Condition (5) is satisfied for all homogeneous Levy bases: 


f &exp(—2A(f — s))d^ds = f 2c{t — s)be = f 

J A J — oo Jo 


2cwbe ^^“dw = < oo, 


by using w = t — s. 

To check the other two conditions, we will use the following inequalities: 


min(l, | 2 Mp) < | 2 Mp, 
and 2 ttlpu|<i — 2 m 1 | 2 [<i < {vJ + jM|) 2 :^. 


( 6 ) 


(V) 

( 8 ) 


The first inequality is obviously true and the second can be proved by considering different relative magnitudes of \zu\ 
and \z\. 

Now, for Condition (4): 


/ ua+ ( 2 wlp„|<i - 2 m 1 | 2 |<i) ^(d 2 ) d^ds 
Ja Jr 

< J ^jual + J (2Mlp„|<i — 2m1|2|<i) i^(d2) ^ d^ds by the Triangle Inequality, 

< J ^Iwaj + J (u^ + jM|)2^^(d2)^ d^ds by ( 8 ), 


= |o| / wd^ds + 


[ z^iy{dz) 

Jr 


M^d^ds + 


[ z^u{dz) 

Jr 


w|d^ds. 


Since z^i'{dz) < oo for our Levy bases with finite second moments, and the integrand function u is both absolutely 
integrable and square-integrable, the expression is finite. So, Condition (4) holds. 

By using similar arguments and (7), we also show that Condition (6) is satisfied in our scenario: 


f f min(l, | 2 Mp)i'(d 2 )d^ds < f f | 2 Mp!^(d 2 )d^ds < [ f z‘^u{dz) 

Ja Jr Ja Jr \Jr 


u^d^ds < oo. 


Thus, the canonical processes which we used in our simulation studies are well-defined. 


3 Computational details 

In this section, we give the full proofs of the results in Examples 2 and 8 of the main paper (MP), Nguyen & Veraart 
(2016). 


3.1 Example 2 (MP) 


Proof. Let Y'j(x) be the canonical OUa process described by (2). Then with {x* ,t*) being the intersection point between 
At{x) and At+dA^ + d:,): 

Cov[Yt{x),Yt+dt{x + da:)] =yar[L']ex.p{-Xdt) f f exp(-2A(f - s))d^ds 

J —OO J X* —c\s — t* \ 
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= Var[L'] exp(—Adt) f 2cjs — i*| exp(—2A(^ — s))ds. 
J —OO 


Without loss of generality, assume that dt > 0, there are now three cases to consider: 

(i) If dx > cdt, the intersection occurs on the lower bound of At+dt {x + dx) and the upper bound of At{x). So: 

X* = X + c{t — t*) = X + dx — c{t + dt — t*) =^t* = —(2cf + cdt — dx) = t+^ — 

=> Cov[Yt{x),Yt+dt{x + dx)] = Var[L'] exp(-Adt) j 2c{t* - s)exp ^-2A “ y + ^ j 

Xd 

= VariL'J exp(- -) / 2cw exp(—2\w)dw where w = t* — s. 

C Jo 

Xdx 


CorT[Yt{x),Yt+d,{x + dx)] = exp - 


(ii) If -cdt <dx< cdt, At{x) n At+dt {x + dt) = At{x). So: 

Cow\Yt{x),Yt+dt{x + dx)] = Var[L']exp(-Adt) f 2c(f - s) exp(-2A(f - s))ds 

J —OO 
pOO 

= Var[L'] exp(—Adt) / 2c«; exp(—2Aw;)d«; where ui = f* — s. 

Jo 


Coj:j:]Yt{x),Yt+d,{x + dx)] = exp(-Adt). 


(iii) If dx < —cdt, the intersection occurs on the upper bound of At+dt (x + dx) and the lower bound of At(x). So: 

X* = X — c(t — t*) = X + dx + eft + dt — t*) t* = — (2ct + cdt + dx) = t 

2c 2 2c 

dt dx \ \ J 


Cov]Yt{x),Yt+dt{x + dx)] = Var[L'] exp(-Adt) j 2c{t* - s) exp ^-2A 

( Xd \ 

—- J J 2cw exp{—2Xw)dw where w — t 


— t — s. 


CorT]Yt{x),Yt+dt (x + d^;)] = exp ( — 

c 


In conclusion, for dt > 0, we have: 


Corr[yt(a;),yt+d,(x + d^)] := p^^''^\\dx\,dt) = 

= min (exp {—Xdt ), exp ( — 


exp(-Adt) if|da;|<cdt, 
exp ^ otherwise, 

Aldx 


3.2 Example 8 (MP) 

Proof. First, we compute the MSE of the RG algorithm. The first term on the right-hand side of (18, MP) is equal to: 

/ . fl/A R/A y 

l|iA|<jAl(t-(j;-|-r)A,t-jA]('S)l|‘^^_j^_ A_a;+iA-|-A)(0 ( Xj/Y) d^ds j 

\ •/RxK / 
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2/i 


i?/A > 

i=o ) 


2m „ 


A2 


(l_e-AA)^ 


1 + e-^^- (^ + 3)e- 


-(fl+A)A . 


(f+ i) 


g-(fl+2A)A 


When R is fixed, A^/ (l — e -'■A^^ = 1/A^ + 0(A). In addition: 


1 + - (f + 3)e-(«+^)^ + (f + l) 

2 

= ^ 2 - AA + 0(A2) + 2i?e-'^^ ^-A + 

= l-(l + Ai?)e-'^^ + 0(A). 

So, (E [Yt{x)] - E [Zt{x)]f = VA-4(1 + A7?)2e-2«^ + 0(A). 

On the other hand, since lim = 0 and lim = 0, (E[Yt{x)]—'E[Zt{x)])‘^ 

R—^oo R—yoo 

VA-4 (l - A^A^ (1 - (1 + e-^^) /2)'. 

Next, we focus on the second term on the right-hand side of (18, MP): 


3AA2 


+ 0(A2) (-2 + AA + 0(A2)) 


/ (fc(x, f, s) —/i(a:, f, s))^ Var [L'j d^ds = / k^{x,t,^,s) — 2k{x,t,^,s)h(x,t,^,s) + h^(x,t,^,s)d^ds. 

JwxR JrxR 


The mixed term k{x, t, s)h(x, t, s)d^ds involves the parts of the approximated ambit set which lie within the 
true ambit set. By reference to Figure 2(ii) (MP), this is equal to: 

fl/A j 

X X / e“^P“">l(t_(j + l)A.t-jA](s)lu+iA-A,^+^A-HA)(Od^ds 

7 = 0 i=-7 JAt{^) 


3=0 i=-j 
R/A 

-X^ 

3=0 


-XjA 


rt-jA j-x+lt-a] 


e-^(*“*)d^ds -f 


H—jA— Jx—\t—s\ 


pt-jA-^ /•x+jA+4 


/t-{j+l)A Jx-jA-%- 


-\{t—s 


>d^d4 


--2AjA 


-X 


3=0 


A 


2 jA + ^ (1 - e-V ) _ ( 2 j + l)Ae-^^ 


After computing the other terms, we find that {k{x, t, s) — h{x, t, f, s))^ Var [L'j d^ds is given by: 
^ + "£ - 1 (. - e-¥) + + ,7, .7 „a} 

3=0 ^ 

^ + f {-^ - P (' - - 5^.-“ + « -7 I)A} 


3=0 

2A2 2A2 


„-AA 


-h 


A 

+ :rT - 


1 — e 


AA 

AA 


- 1 1 f - f — + 1) -f i):e-2(«+2A)A 


A 


R 


A 


4A2a2 

(l-e-2AA)2 


2AA 


A2 2A 


A2 


AA / 1 — e 
2 


I'l _ g-2(fl+A)A^ 


2A2 


(l + 2i?A)e-2"^-bO(A) 
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by using similar arguments involving series expansions and treating R as fixed. On the other hand, if we fix A and let 
i? —> cx), this converges to: 


2A2 I I AA 




(l_g-2AA)2 


„-AA 


A 

+ trr - 


1 - — 
1 — e 2 


2AA 


A2 2A 


A2 


AA\ 1 1-e 


-2AA 


When we put the first and second terms of the MSE together, we have: 

E [\Yt{x) - Zt{xf\ = 
andE[|Ft(x)-Zt(x)| 


4/i2 


(1 + \Rf + (1 + 2Ai?) I + 0(A) for fixed R, 

zX^ 

2 ] Jt —>00 4/i2 


A4 


1 - 


A2A2 




1 / / 2 (1 - e-^^) \ 4A2A2e 

^ r ~ i AA ^ ) (1 - e-2^^) 


(l-e-^2A) 

2 A 2^-2AA 


1 + e 


-AA 


„-AA 


A 

+ AV - 


A2 2A 


_ AA 
1 — e 2 


2AA 


A2 (^) ) 1 


0-2AA 


The DG MSE and its limits can be computed similarly. Now, (E [yt(a:)] — E [Zt{x)\f' is equal to: 


j=0 2 = 0 


H-jA 

/ 2{t — jA — s)ds + 2 / {A — (t — {j + 1)A — s)) ds 

A-(j+r)A Jt-u+2)A 


^t-{j+l)A 


2fi 

4^2 


n/A 


2(i + 1)A 


2g-AjA 


3=0 


1 - 


A2A2 




) 2 

. 


Next, we compute k{x, t, s)h{x, t, s)d^ds. For the DG algorithm, the intersection of the approximated ambit 
set and the true ambit set is exactly equal to the approximated ambit set itself. Thus, by reference to Figure 4(ii) (MP), we 
can write: 




/ k{x,t,^,s)h{x,t,^,s)d^ds 

JrxR 

R/A j 

j=0 2 = 0 

R/A 

= E 2(i + 1)A 
3=0 


pt-jA px+{2i-j)A+{t-jA-s) pt-{j+l)A px+{2i-j+l)A-{t-{j+l)A-s) 

I / e-^(*-"M^ds+ / / e-^(*-*)dCds 

t-0'+l)A Jx+{2i-j)A-{t-jA-s) 


/t-(j+2)A 4x+(2i-j-l)A+(t-(j+l)A-s) 


\2g-2AjA_ 


Upon calculating the other components, we have: 
f {k{x, t, s) — h(x, t, s))^ Var [L’\ d^ds 

Tj + 2 "£o + {*” - |i (1 - e-“) 

4A2A2 


T 


1 - 


3=0 

2 




(l_g-2AA)2 


R 


A 


R 


-2AA _ / ill _|_ 2 ] g-2(i?+A)A :^g-2(it+2A)A 


A 
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By considering the behaviour of the two terms on the right-hand side of (18, MP) separately, we conclude that: 


E [|Ft(x) - 
E [\Yt{x) - Zt(x)i"] 


^{l + RfX^ + ^(l + 2\R) 


„-2RX 


+ 0(A) for fixed R, and 



4A2A2 

(I_g-2AA)2JJ ’ 


as i? —> cx). 


4 Additional simulation results 

In Nguyen & Veraart (2016), we proposed two estimation methods. The first is a spatio-temporal extension of the MM 
method in Jonsdottir et al. (2013). This uses the normalised variograms at single temporal and spatial lags to obtain A and 
c. The second method is a LS variation of the first technique. Instead of using single lags, we consider a range of lags and 
fit LS curves to the corresponding values of the normalised variograms. (We chose to use 15 temporal and spatial lags in 
our simulation experiments.) In both methods, cumulants of the process were used to estimate the Levy basis parameters. 
We compared the methods in simulation experiments where data was generated using two simulation algorithms. While 
one generates data on the usual RG, the other generates data on a DG to mimick the edges of the integration set. Model 
parameters were estimated from the simulated data. A PC with the following characteristics was used in our simulation 
study: Intel® Core™i7.3770 CPU Processor @ 3.40GHz; 8 GB of RAM; Windows 8.1 64-bit. 

The parameter estimates for the case where L is Gaussian and c = 1 were shown via box plots in Section 5 of Nguyen 
& Veraart (2016). Unlike the other model parameters for which both inference methods gave reasonable estimates, 
interesting results were obtained for the parameter c. For data generated on the RG, it was observed that c was consistently 
underestimated when the MM method was used. In comparison, for data generated on the DG, accurate values of c were 
achieved through both inference methods. For both the RG and the DG data, estimates of c were also seen to exhibit 
higher variance when the LS approach was used. 

In this section, we show that similar observations can be made when c = 2, c = 0.5 and L is a IG, NIG or Gamma basis. 
These bases are defined through their distribution for any E 6 Bb{S) in Table 1 (Jdnsdottir et al. 2013). Let f{x;0) 
denote the probability density function (PDF) of X evaluated at x, given the parameters d. The distributional parameters 
in Table 1 correspond to the following PDFs: 

(i) if A ~ IG f{x; 5,j)=6 (2Trx^) exp (^7 — 2~^ lx + 7 ^*)), for x > 0,5 > 0 ,7 > 0; 

(ii) if A ~ NIG (a, /?, /i, S), let Ki denote the modified Bessel function of third order and index 1. Then, for x 6 R, 

/(x; a, P, fi, 5) = aS ^52 -|- (x — j exp + P {x — (^\j 5"^ + {x — , where 

a, p, fi and 6 satisfy e M., 6 > 0 and 0 < |/3| < a; 

(iii) if A ~ Gamma (a, p), f (x; a, p) = /3“r (a)~^ x““^ exp (—/3x), for x > 0, a > 0 and /? > 0. As usual, T (a) 
represents the Gamma function evaluated at a. 

For each scenario, we generate 500 data sets via the RG and DG algorithms, covering the [0,10c] x [0,10] space-time 
region, Aj = 0.05, p = q = 300 and A = 1. These are the simulation settings used for the Gaussian case with c = 1. 
The basis parameter values used are given in Table 2. These are chosen so that the Levy seeds have a mean and standard 
deviation of approximately 0.2 and 0.1, allowing us to compare the results of one basis with the Gaussian and other non- 
Gaussian bases. The Levy seed distributions for the four bases are shown in Figure 1. Although they share similar first 
two moments, there are differences in other features of the distributions such as the skewness. The cumulant expressions 
in terms of the distributional parameters which are needed for the MM and LS inference are shown in Table 3. 

Figures 2-6 correspond to the estimates obtained in the cases of L Gaussian with c = 2, L Gaussian with c = 0.5, L IG, L 
Gamma and L NIG. In each figure, the estimates are arranged according to the parameters by row and the grid/inference 
method by column. 
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Figure 1: Chosen Levy seed distributions for simulated data. 


Levy seed distributions 



X 


Table 1: Distributions of the three homogeneous non-Gaussian Levy bases. Here, E 6 Bb{S) and A^+i is the Lebesgue 
measure on 


Basis 

Parameters 

Distribution 

IG 

(5,7 

L{E)^ 

IG{SXd+i{E),j) 

NIG 

a,l3,fj.,6 

L{E)^ 

^ NIG{a,l3,tiXd+i{E),6Xd+i{E)) 

Gamma 

a,(i 

L(E)^ 

^ Gamma[aXd+i{E), P) 


Table 2: True parameters for the IG, NIG and Gamma bases. 


IG 

NIG 

Gamma 

(5 = 1 

a = 20 

a = 4.3 

00 

II 

13 =-5 

j3 = 21.5 


S = 0.2 



fi = 0.27 



From the plots corresponding to the RG algorithm, we see the tendency of the MM method to underestimate c and 
notice that the LS estimates for c have greater standard deviations than the MM estimates. For example, in the case of 
L Gaussian and c = 2, c has a range of about 0.2 under the MM estimation in Figure 2(f) and and a range of about 0.7 
under the LS approach in Figure 2(e). This increase in variance under the LS approach also had the effect of centering 
the c estimates about their true values (given by the red lines). Note that for the NIG basis, 179 invalid estimates for a 
and 204 invalid estimates for 6 were arose due to the parameter constraints and the mismatch between the empirical and 
theoretical cumulants of L'. 

Unlike for the RG, accurate estimates of c are obtained from both inference methods for the DG data. We see this from 
the centering of c about the true values (denoted by the red lines) in each scenario. Increased variance in our c estimates 
is also observed when we apply a LS approach to our estimation method. For example, for the case of L Gaussian and 
c = 2, c has a range of about 1.8 in Figure 2(g) which is much larger than 0.14, the range of c under the original MM 
method in Figure 2(h). 

More comments on these observations can be found in Section 5 of Nguyen & Veraart (2016). 
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Table 3: The cumulants ki{L'), for Z = 1,..., 4, in terms of the Levy seed parameters when L is IG, NIG and Gamma. 


Cumulant 

IG 

NIG 

Gamma 

«i(L') 

5h 

p + 5/3/(a2 -/52)1/2 

a/P 

K2{L') 


5aV(a2 - ^2)3/2 



3J/7® 

35/3aV(a2 - /32)5/2 

2all3^ 

K4,{L') 

155 / 7 '’’ 

35(a2+4^2)^2/(^2_ ^2)7/2 

6a//3^ 


Figure 2: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 2 and L being a Gaussian basis. The true values are denoted by the red lines. 
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Figure 3: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 0.5 and L being a Gaussian basis. The true values are denoted by the red lines. 
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Figure 4: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 1 and L being an IG basis. The true values are denoted by the red lines. 
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Figure 5: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 1 and L being a Gamma basis. The true values are denoted by the red lines. 
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Figure 6: Box plots of the parameter estimates from the LS and MM methods under the RG and DG algorithms in the 
case c = 1 and L being a NIG basis. The true values are denoted by the red lines. The 179 invalid estimates for a and 
204 invalid estimates for 5 are omitted in the plots. 
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5 Additional empirical results 


In this section, we show the heat plots of the data generated from the canonical OUa processes fitted to anomaly data of 
outgoing longwave radiation (OLR) from the International Research Institute for Climate and Society. The simulated data 
were generated using the DG algorithm and alternate coordinates were used for the plots in order to avoid the missing 
values. In each of the plots, the heat plot of the OLR data and the heat plot of the corresponding data run are placed side 
by side to facilitate comparison. The radiation anomalies are measured in watts per square metre. Due to the random 
nature of the simulations, we cannot judge the appropriateness of the fitted models based on individual runs. However, as 
diagonal lines seem to outline diamond-like structures in our simulated data, one can say that the fitted canonical OUa 
processes give linear approximations to the spatio-temporal dependencies observed in the OLR phenomena. 


Figure 7: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 2. 
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Figure 8: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 2. 
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Figure 9: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 3. 
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Figure 10: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 3. 
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Figure 11: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 4. 
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Figure 12: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 4. 
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Figure 13: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 5. 
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Figure 14: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 5. 
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Figure 15: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 6. 
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Figure 16: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 6. 
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Figure 17: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 7. 
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Figure 18: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 7. 
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Figure 19: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed ! 

__ (ii) Gaussian Simuiated Data 

(i) OLR Data ' ' 



-50 


I 


300 - 


50 250 - 


200 - 


150 


100 - 


50 


10 


20 


30 


50 


I 


-50 


40 50 


Figure 20: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 8. 
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Figure 21: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 9. 
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Figure 22: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 9. 
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Figure 23: Heat plot of: (i) the radiation anomaly data; (ii) the Gaussian simulated data, seed 10. 
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Figure 24: Heat plot of: (i) the radiation anomaly data; (ii) the NIG simulated data, seed 10. 
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